What is the evolution of highly concurrent Web services 07/06 Update SLTechnology News&Howtos

What is the evolution of highly concurrent Web services

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article focuses on "what is the evolution of highly concurrent Web services". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let the editor take you to learn "what is the evolution of highly concurrent Web services"?

1. More and more concurrent connections

The number of concurrent connections faced by current Web systems has increased exponentially in recent years, and high concurrency has become a norm, which brings great challenges to Web systems. The simplest and roughest way to solve this problem is to add machines to the Web system and upgrade hardware configuration. Although the hardware is getting cheaper and cheaper, it is very expensive to blindly increase the number of machines to solve the growth of concurrency. Combined with technical optimization scheme, is a more effective solution.

Why is the number of concurrent connections increasing exponentially? In fact, in terms of the user base in recent years, this number has not increased exponentially, so it is not the main reason. The main reason is that web has become more complex and interactive.

1. The number of page elements is increased and the interaction is complex.

Web pages have more and more elements and are richer. More resource elements mean more download requests. The interaction of Web system is becoming more and more complex, and the scene and frequency of interaction are also greatly increased. Take the home page of "www.qq.com" as an example. Once refreshed, there will be about 244 requests. And, after the page is opened, there will be some regular queries or escalation requests that continue to operate.

Current Http requests, in order to reduce repeated creation and destruction of connections, usually establish persistent connections (Connection keep-alive). Once established, the connection will be maintained for a period of time and reused by subsequent requests. However, it also brings another new problem, the maintenance of the connection will take up the resources of the Web system server, if you do not make full use of this connection, it will lead to resource waste. After the persistent connection is created, the first resources are transferred, after which there is almost no data interaction, and the system resources occupied by the persistent connection are not automatically released until the timeout.

In addition, there are some Web requirements that require long-term connectivity, such as Web socket.

two。 The number of connections to mainstream browsers is increasing.

In the face of more and more abundant Web resources, the number of concurrent connections of mainstream browsers is also increasing. In the same domain, early browsers generally had only 1-2 download connections, while the current mainstream browsers usually have 2-6. Increasing the number of concurrent connections in browsers can speed up the loading speed of pages in scenarios where there are more resources to download. More connections are good for browsers to load page elements, and other normal download connections can continue to work if some connections encounter "network congestion".

This naturally and invisibly increases the pressure on the back end of the Web system, and more download connections mean taking up more Web server resources. In the peak period of user access, the "high concurrency" scene has been formed since it became hot. These connections and requests occupy a lot of resources such as CPU and memory of the server. Especially in web pages with more than 100 + resources, it is necessary to use more download links.

Second, the front end of Web is optimized to reduce the pressure on the server.

To alleviate the pressure of "high concurrency", we need the cooperation and optimization of front-end and back-end in order to achieve the maximum effect. The front end of the Web on the front line of the user can have the effect of reducing or mitigating Http requests.

1. Reduce Web requests

The common implementation method is through the expire or max-age in the Http protocol header to control, put the static content into the browser's local cache, after a period of time, no longer request the Web server, directly use the local resources. There is also the local storage technology (LocalStorage) in HTML5, which is also used as a powerful local cache of data.

After caching, this scheme does not send requests to the Web server at all, which greatly reduces the pressure on the server and brings a good user experience. However, this scheme is not effective for users visiting for the first time, and at the same time, it also affects the real-time performance of some Web resources.

two。 Mitigating Web requests

There is an expiration time in the browser's local cache, and once it expires, it must re-request to the server. At this time, there will be two situations:

(1) the resource content of the server is not updated, the browser requests Web resources, and the server replies "you can continue to use local cache". (communication occurs, but the Web server only needs to make a simple "reply")

(2) the file or content of the server has been updated, the browser requests the Web resource, and the Web server transmits the new resource content through the network. (communication occurs and the Web server needs to complete complex transmission work)

The negotiation method here is controlled by Last-Modified or Etag of the Http protocol. At this time, the request server will return 304Not Modified if the content has not changed. In this way, you do not need to do the complex work of transferring complete data files every time you request the Web server, as long as a simple http reply can achieve the same effect.

Although the above request "relieves" the pressure on the Web server, the connection is still established and the request occurs.

3. Merge page request

Older Web developers should be more impressed before ajax became popular. Most of the pages are output directly, and there are not so many ajax requests. The Web backend pieced together the content of the page and returned it to the front end. At that time, the static page was a wide range of optimization methods. Later, it was gradually replaced by the more interactive ajax, and there were more and more requests for a page.

As the mobile network (2G/3G) is much worse than the PC broadband, and some mobile phones are relatively low, facing a web page with more than 100 requests, the loading speed will be much slower. As a result, the direction of optimization returns to merging page elements, reducing the number of requests:

(1) merge HTML display content. Embed CSS and JS directly into the HTML page without being introduced through a connection.

(2) Ajax dynamic content merge request. For dynamic content, merge 10 Ajax requests into one batch information query.

(3) merge small pictures. Through the offset technology Sprites of CSS, many small pictures are merged into one. This optimization method is also very common in Web optimization on the PC side.

Merging requests reduces the number of data transfers, which is equivalent to changing them from one request to one "batch" request. The above optimization method achieves the purpose of "reducing" the pressure on the Web server and reduces the connection that needs to be established.

Save the memory of the Web server

After the optimization of the front end, we need to focus on the Web server itself. Memory is a very important resource for Web servers, and more memory usually means that more tasks can be put in at the same time. In terms of memory consumption by Web services, it can be roughly divided into:

(1) the basic memory used to maintain the connection. When the process is initialized, some basic modules are loaded into memory.

(2) the transmitted data content is loaded into each buffer and occupies the memory.

(3) the memory applied and used during the execution of the program.

If we maintain a connection that consumes as little memory as possible, then we can maintain more concurrent connections, thus allowing the Web server to support more concurrent connections.

Apache (httpd) is a mature and ancient Web service, and the development and evolution of Apache has been pursuing this, trying to continuously reduce the memory occupied by the service to support more concurrency. From the perspective of the evolution of Apache's working mode, let's take a look at how they optimize memory.

1. Prefork MPM, multi-process working mode

Prefork is the most mature and stable working mode of Apache, and even now, it is still widely used. After the main process is generated, it first completes the basic initialization work, and then pre-generates a batch of child processes through fork (the child process will copy the memory space of the parent process and does not need to do the basic initialization work). Then wait for the service to be pre-generated in order to reduce the cost of frequent creation and destruction processes. The advantage of multiple processes is that the memory data between processes will not interfere with each other, at the same time, the abnormal termination of one process will not affect other processes. However, in terms of memory, each httpd child process takes up a lot of memory because the child process's memory data is copied from the parent process. We can roughly assume that there is a lot of "duplicate data" in memory. In the end, there is a limit to the maximum number of child processes that we can generate. In the face of high concurrency, because there are a lot of Keep-alive long connections, these child processes are "occupied", which is likely to lead to the depletion of available child processes. Therefore, prefork is not very suitable for high concurrency scenarios.

Advantages: mature and stable, compatible with all new and old modules. At the same time, there is no need to worry about thread safety. (for example, our commonly used mod_php, which compiles PHP into a submodule of Apache, does not need to support thread safety)

Cons: a service process takes up a lot of memory.

2. Worker MPM, mixed mode of multi-process and multi-thread

Compared with prefork, worker mode uses a mixed mode of multi-process and multi-thread. It also pre-fork several child processes (a small number), and then each child process creates some threads (including a listener thread). Each request comes and is assigned to a thread to serve. Threads are lighter than processes because threads usually share the memory space of the parent process, so the memory footprint is reduced. In high concurrency scenarios, because it is more memory-efficient than prefork, there are more threads available.

However, it does not solve the problem of Keep-alive 's long connection "seizing" threads, but the object becomes a lighter thread.

Some people will find it strange, so why not completely use multithreading here, but also introduce multiple processes? Because stability also needs to be considered, if a thread dies, it will cause other normal child threads under the same process to die. If all are multithreaded, a thread dies, causing the entire Apache service to be wiped out. In the current working mode, only some of the services of Apache are affected, not the whole service.

Threads share the memory space of the parent process, which reduces the footprint of memory, but causes new problems. It is "thread safety", in which multiple threads modify the "competitive behavior" caused by shared resources, and force the modules we use to support "thread safety". Therefore, it increases the instability of Web services to some extent. For example, the PHP extension used by mod_php also needs to support "thread safety", otherwise it cannot be used in this mode.

Advantages: take up less memory and perform better with high concurrency.

Disadvantages: thread safety must be considered, and the introduction of locks increases the overhead of CPU.

3. Event MPM, mixed mode of multi-process and multi-thread, introducing Epoll

This is a relatively new mode in Apache, and it is already a stable and available mode in the current version (Apache 2.4.10). It is very similar to the worker mode, the biggest difference is that it solves the problem of resource waste of long-occupied threads in the keep-alive scenario. In event MPM, there will be a special thread to manage these keep-alive-type threads, pass the request to the service thread when a real request comes, and allow it to be released after execution. It reduces the waste of resources that "occupy" connections without using them, and enhances the ability to process requests in high concurrency scenarios. Because the number of "idle" threads is reduced, the number of threads is reduced, and the memory footprint is reduced in the same scenario.

When event MPM encounters some incompatible modules, it becomes invalid and falls back to worker mode, where a worker thread processes a request. The new version of Apache officially comes with modules, all of which support event MPM. Note that event MPM requires EPoll support from the Linux system (Linux 2.6 +) to enable it. Among the three modes of Apache, event MPM is the most memory-saving in real application scenarios.

4. Use the lightweight Nginx as the Web server

Although the continuous optimization of Apache reduces the memory footprint, thus increasing the ability to handle high concurrency. However, as mentioned earlier, Apache is an ancient and mature Web service. At the same time, it integrates many stable modules and is a heavy Web service. Nginx is a lightweight Web service that naturally takes up less memory than Apache. Moreover, Nginx serves N connections through one process. The way used is not to add processes / threads to Apache to support more connections. For Nginx, it creates fewer processes / threads and reduces a lot of memory overhead.

According to the QPS performance test results of static files, the performance of Nginx is about 3 times higher than that of Apache on static files. The QPS,Nginx of dynamic files such as PHP is usually done through FastCGI and PHP-FPM communication, and PHP exists as an unrelated external service. Apache usually compiles PHP into its own word module (the new version of Apache also supports FastCGI). For PHP dynamic files, the performance of Nginx is slightly lower than that of Apache.

5. Sendfile saves memory

Many Web services, such as Apache and Nginx, are supported by sendfile. Sendfile can reduce the footprint of data to "user-mode memory space" (user buffer), which in turn reduces the memory footprint. Of course, the first reaction of many students is to ask Why. To explain this principle as clearly as possible, let's first go back to the interaction between the storage space of the Linux kernel state and the user state.

In general, the user mode (that is, the memory space where our program is located) does not directly read, write or operate various devices (disks, networks, terminals, etc.), and the kernel is usually used as a "middleman" to complete the operation or read and write of the device.

With the simplest example of reading and writing to disk, read file A from disk and write to file B. File A data is started on disk, then loaded into the "kernel buffer", and then copied to the "user buffer" before we can process the data. When writing, in the same way, load from the "user mode buffer" to the "kernel buffer" and finally write to the disk B file.

It's tiring to write a file like this, so some people think that you can skip the copy of the "user buffer" here. In fact, this is the implementation of MMP (Memory-Mapping, memory Mapping), which establishes a direct mapping between disk space and memory, and the data is no longer copied to the "user state buffer", but returns a pointer to the memory space. So, our previous example of reading and writing files becomes that file A data is loaded from disk into the "kernel buffer", then copied from the "kernel buffer" to the "kernel buffer" of the B file, and the B file is written back to disk from the "kernel buffer". This process reduces one memory copy and reduces memory footprint at the same time.

Well, back to sendfile, to put it simply, sendfile is similar to MMP in reducing the memory copy of data from the "kernel-state buffer" to the "user-mode buffer".

The default disk file is read and transferred to socket. The process (without using sendfile) is:

After using sendfile:

In this way, not only memory is saved, but also the overhead of CPU.

Fourth, save the CPU of Web server

CPU is another very core system resource for Web servers. Although in general, we believe that the execution of business procedures consumes our major CPU. However, as far as Web service programs are concerned, multi-threaded / multi-process context switching also consumes CPU resources. A process / thread usually cannot occupy the CPU for a long time. When blocking occurs or the time slice runs out, it can no longer occupy the CPU. At this time, a context switch occurs, and the CPU time slice switches from the old process / thread to the new one. In addition, in scenarios with a high number of concurrent connections, polling and checking the status of connections (socket file descriptors) established by these users also consumes CPU.

The development and evolution of Apache and Nginx are also trying to reduce CPU overhead.

1. Select/Poll (the previous version of Apache for Icano Multiplexing)

In general, Web services have to maintain many socket file descriptors that communicate with users. Istroke O multiplexing is actually to facilitate the management and detection of these file descriptors. Earlier versions of Apache used select mode, which, in a nutshell, gave the kernel the socket file descriptors that we were concerned about and asked the kernel to tell us which descriptors were operable. The principles of Poll and select are basically the same, so if we put them together, we will not dwell on the differences between them.

What select/poll returns is a collection of file descriptors we submitted earlier (the kernel modified the identity bits of socket file descriptors that are readable, writable, or abnormal), and we need to poll to get the file descriptors we can manipulate. In this process, it is repeated over and over again. In practical application scenarios, most of the socket file descriptors we monitor are "idle", that is, they cannot be operated. We polled the entire collection to find a small number of socket file descriptors that we can manipulate. Therefore, as we monitor more socket file descriptors (more and more concurrent connections by users), the polling becomes heavier and heavier, which in turn increases the overhead of CPU.

If almost all of the socket file descriptors we monitor are "active", it is more appropriate to use this mode instead.

2. Epoll (event MPM,Nginx and other support of the new version of Apache)

Epoll is an improvement of select/poll, which is officially supported by Linux2.6 and can be understood as an improvement on select/poll. First, we also tell the kernel the set of socket file descriptors we are interested in, register them as "callback functions" and notify us through the callback function if a socket file is ready. Therefore, instead of polling the entire set of socket file descriptors, we can directly get the already operable socket file descriptors. In that case, we will not traverse most of the "idle" descriptors. Even if we monitor more and more socket file descriptions, we poll only "active and actionable" socket file descriptors.

In fact, there is an extreme scenario in which almost all of our file descriptors are "active", which leads to the execution of a large number of callback functions and increases the overhead of CPU. However, in the real world of Web services, most of the time, there are many "free" connections in the connection set.

3. Creation, destruction and context switching of thread / process

Typically, at some point in the Apache, a process / thread serves a connection. As a result, Apache has many processes / threads that serve many connections. During the peak period, Web services will set up a lot of processes / threads, which will bring a lot of context switching overhead. Nginx, on the other hand, usually has only one master main process and several worker child processes, and then a worker process serves many connections, thus saving CPU context switching overhead.

Although the two models are different, in fact, they can not be divided directly. Generally speaking, each has its own advantages, so there is no doubt about it.

4. The overhead of multithreaded locks on CPU

Both worker and event modes in Apache use multithreading. Because multithreads share the memory space of the parent process, there will be competition when accessing the shared data, that is, the problem of thread security. Therefore, locks are usually introduced (the common thread-related locks under Linux are mutex metux, read-write lock rwlock, etc.). The thread that successfully acquires the lock can continue to execute, and the failed thread usually chooses to block waiting. By introducing the lock mechanism, the complexity of the program often increases a lot, and there is also the risk of thread "deadlock" or "starvation" (the same problem occurs when multiple processes access shared resources between processes).

Deadlock phenomenon (two threads lock each other's desired resources, block and wait for each other, and can never meet the conditions):

Starvation (a thread that has been unable to get the resources it wants to lock and can never perform the next step):

In order to avoid the problems caused by these locks, you have to increase the complexity of the program, and the solutions generally include:

(1) for locking resources, according to the agreed order, everyone locks the shared resource X first, and only after the locking is successful can the shared resource Y be locked.

(2) if the thread occupies resource X but fails to lock resource Y, the lock is abandoned and the previously occupied resource X is released.

When using PHP, thread safety must also be compatible with Apache's worker and event modes. In general, the new version of the PHP official library has no thread safety questions and needs to focus on third-party extensions. PHP implements thread safety, not by locking. Instead, you request a copy of the global variable independently for each thread, which is equivalent to the thread's private memory space, but this consumes a little more memory. However, this advantage is that there is no need to introduce a complex locking mechanism implementation, but also avoids the overhead of the locking mechanism on CPU.

Incidentally, PHP-FPM (FastCGI), which often works with Nginx, uses multiple processes, so there are no thread safety issues.

At this point, I believe you have a deeper understanding of "what is the evolution of highly concurrent Web services". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.