Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to improve the concurrent processing ability of the server

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how to improve the concurrent processing ability of the server, the article is very detailed, has a certain reference value, interested friends must read it!

What is the concurrent processing capacity of a server? the more requests a server can handle per unit time, the higher the server's ability, that is, the stronger the server's concurrent processing capacity.

Is there any way to measure the concurrent processing ability of the server 1. Throughput rate

Throughput, the maximum number of requests processed by the server per unit time, in req/s

From the point of view of the server, the actual number of concurrent users can be understood as the total number of file descriptors currently maintained by the server representing different users, that is, the number of concurrent connections.

The server generally limits the maximum number of users that can be served at the same time, such as the MaxClents parameter of apache.

Let's go a little further. For the server, the server wants to support high throughput, and for the user, the user only wants to wait for the least time. Obviously, the two sides cannot be satisfied, so the balance of interests between the two sides is the maximum number of concurrent users we want.

two。 Pressure testing

One principle must be clarified first: if 100 users make 10 requests to the server at the same time, which is the same as 1 user making 1000 requests to the server in a row, is the pressure on the server the same?

It is actually different, because for each user, sending a request continuously actually means sending one request and receiving the response data before sending the next request.

In this way, for one user to make 1000 requests to the server continuously, there is only one request in the server's network card receiving buffer at any time, while for 100 users to make 10 requests to the server at the same time, the server's network card receives no more than 100 requests waiting to be processed, obviously the server is under more pressure at this time.

Condition concurrent users considered on the premise of stress testing: total number of users sending requests to the server at a certain time (HttpWatch) Total number of requests resource description request waiting time (user waiting time) average user request waiting time server average request processing time

Hardware environment

The time of concern in the stress test is subdivided into the following two categories: the average request waiting time of the user (the data transmission time on the network is not taken into account here, as well as the local computing time of the user PC)

Average request processing time on the server

The average request waiting time of the server is mainly used to measure the service quality of a single user under a certain number of concurrent users, and the average request processing time of the server is the reciprocal of throughput.

Generally speaking, the average request waiting time of the user = the average request processing time of the server * how does the number of concurrent users improve the concurrent processing ability of the server

1. Improve CPU concurrent computing power

The reason why the server can handle multiple requests at the same time is that the operating system makes multiple tasks use system resources in turn through the design of multi-execution flow system.

These resources include CPU, memory and Ihand O. Here the Icano mainly refers to the disk Icano and the network Icano.

Multi-process & multi-thread

The general implementation of multi-execution flow is process, and the benefits of multi-process can take turns to use CPU time, and overlap CPU computing and IO operations. IO here mainly refers to disk IO and network IO, which are pitifully slow compared to CPU.

In fact, the time of most processes is mainly spent on the Imax O operation.

The DMA technology of modern computer can make CPU not participate in the whole process of iUnip O operation, for example, through system call, the process makes CPU send instructions to the network card or disk, and then the process is suspended, releasing CPU resources, and waiting for the Imax O device to finish its work and notify the process to be ready again through interruption.

For a single task, CPU is idle most of the time, when the role of multiple processes is particularly important. How does CPU know the code? I recommend you to take a look.

Multi-process can not only improve the concurrency of CPU. Its advantage is also reflected in the stability and robustness brought by independent memory address space and life cycle, in which the collapse of one process will not affect the other.

But the process also has the following disadvantages: fork () system call overhead: prefork inter-process scheduling and context switching costs: reduce the large number of processes of memory duplication: shared memory

IPC programming is relatively troublesome.

Reduce process switching

When the hardware context is loaded and removed frequently, the time consumed is considerable. The Nmon tool can be used to monitor the number of context changes per second on the server.

In order to minimize the number of context switching, the simplest way is to reduce the number of processes, try to use threads and cooperate with other Icano models to design concurrency strategies.

You can also consider using process binding CPU technology to increase the hit ratio of the CPU cache. If the process keeps switching between CPU, the old CPU cache will be invalidated.

Reduce the use of unnecessary locks

When the server processes a large number of concurrent requests, there is some resource preemption competition in multiple request processing tasks, so the "lock" mechanism is generally used to control the occupation of resources. What exactly is a reentry lock? I recommend you to take a look at it.

When a task takes up resources, we lock the resources, while other tasks are waiting for the lock to be released, a phenomenon called lock contention.

Through the nature of lock competition, we should realize that we should minimize the competition of concurrent requests for shared resources.

For example, turn off the server access log if allowed, which can greatly reduce the delay time while the lock is waiting. To minimize the waiting time for innocent people.

Here, when we talk about lock-free programming, the kernel completes this locking mechanism, which mainly uses atomic operations instead of locks to protect access to shared resources.

When using atomic operations, the lock instruction is used in the actual write operation, which prevents other tasks from writing this piece of memory and avoids data competition. The operation speed of atoms is faster than that of locks, which is generally more than twice as fast.

For example, fwrite (), fopen (), which uses append to write files, its principle is that lock-free programming is used, the complexity of lock-free programming is high, but the efficiency is fast and the probability of deadlock is low.

Follow Wechat official account: Java technology stack, reply in the background: multithreading, you can get my N latest Java multithreading tutorials, all of which are practical information.

Consider process priority

The process scheduler dynamically adjusts the priority of the process in the running queue and observes the PR value of the process through top.

Consider the system load

You can check / proc/loadavg at any time, and the load average in top can also see

Consider CPU utilization

In addition to the CPU usage in user space and kernel space, pay attention to the CPU O wait, which is the percentage of time that the CPU is idle and waiting for the Imax O operation to complete (see the value of wa in top).

two。 Consider reducing memory allocation and release

In the working process of the server, a large amount of memory is needed, which makes the allocation and release of memory particularly important.

The memory allocation and data replication time of intermediate temporary variables can be reduced appropriately by improving the data structure and algorithm complex system, and the server itself uses its own strategies to improve efficiency.

For example, Apache applies for a large amount of memory as a memory pool at the beginning of the run, and then acquires it directly in the memory pool if needed. It does not need to be reallocated, thus avoiding memory consolidation time caused by frequent memory allocation and release.

For example, Nginx uses multi-threads to process requests, so that multiple threads can share memory resources, thus greatly reducing its overall memory usage.

In addition, Nginx's phased memory allocation strategy, which allocates on demand and releases in time, keeps memory usage in a very small range.

In addition, you can consider shared memory.

Shared memory refers to the large amount of memory that can be accessed by different central processing units (CPU) or shared by different processes in multiprocessor computer systems. It is a very fast way of process communication.

But there is also a downside to using shared memory, that is, it is not easy to unify data when using multiple machines.

The shell command ipcs can be used to display the status of the shared memory under the system, the function shmget can create or open a shared memory area, the function shmat connects an existing shared memory segment to the process space, the function shmctl can perform multiple operations on the shared memory segment, and the function shmdt function separates the shared memory.

3. Consider using persistent connections

Persistent connection is also a persistent connection, which is a common way of TCP communication, that is, it continuously sends multiple data in a TCP connection and continues to open the connection.

The opposite way is called a short connection, that is, after a connection is established, one piece of data is disconnected, and then the connection is established again to send the next data, over and over again.

Whether or not to use persistent connections depends entirely on the characteristics of the application.

From a performance point of view, the operation of establishing a TCP connection is not a small overhead. When allowed, the less the number of connections, the more conducive to performance improvement. Especially for intensive images or web pages and other small data request processing has obvious acceleration.

HTTP persistent connections require the cooperation of browsers and web servers. At present, browsers generally support persistent connections, as shown in the HTTP request header that contains a declaration about persistent connections, as follows: Connection: Keep-Alive

Mainstream web servers support persistent connections, such as apache, where KeepAlive off can be used to close persistent connections.

Another key point for the effective use of persistent connections is the setting of timeout for persistent connections, that is, when long connections are closed?

The default setting of Apache is 5s. If this time setting is too long, it may lead to invalid occupation of resources, maintain a large number of idle processes, and affect server performance.

4. Improved Ipaw O model

There are many types of Icano operations according to different devices, such as memory Ipicuro, network Ipicurus O, and disk Icando O. Explain in detail the four Istroke O models in Java and recommend you to take a look at them.

For network iMagub O and disk iMague O, they are much slower, although using RAID disk arrays can speed up disk iMagub O through parallel disks, and the purchase of Dalian exclusive network bandwidth and the use of high-bandwidth network adapters can improve the speed of network iMagano.

But these operations need kernel system calls, and these need to be scheduled by CPU, which makes CPU have to waste valuable time waiting for slow operations.

We hope to make CPU spend enough time on the scheduling of iMax O operation, and how to make high-speed CPU and slow Imax O devices work better is a topic that modern computers have been discussing all the time. The essential difference of various IPUBO models lies in the way CPU participates.

DMA technology

The data transmission mode between the iPink O device and the memory is completed by the DMA controller. In DMA mode, CPU only needs to issue commands to DMA and let the DMA controller handle the data transmission, which can greatly save system resources.

Asynchronous IPUBO

Asynchronous Iripple O means that after actively requesting data, you can continue to process other tasks, and then wait for the notification of Iripple O operation, so that the process does not block when the data is read or written.

Asynchronous CPU O is non-blocking, and when the function returns, the real transfer is complete, which allows for a good overlap between the Imax processing and the Imax O operation.

Ipaw O multiplexing

It is necessary for the epoll server to process a large number of file descriptors at the same time. If the synchronous non-blocking Imax O model is adopted, if the data of the TCP connection is received at the same time, the method of receiving data must be called on each socket in turn, regardless of whether the socket has acceptable data or not.

If most of the socket has no data to receive, the process will waste a lot of CPU time checking that the socket has any data to receive.

The emergence of multi-channel I _ ready notifications provides a high-performance scheme for checking the readiness of a large number of file descriptors, which allows the process to monitor all file descriptors at the same time and quickly obtain all ready file descriptors, and then make data access only for these file descriptors.

Epoll can support both horizontal and edge triggering, which is theoretically better, but the code implementation is complex because any accidental loss event will result in request processing errors.

Epoll has two major improvements: epoll only tells you the ready file descriptors, and when you call epoll_wait () to get the file descriptors, the return is not the actual descriptor, but a value representing the number of ready descriptors, and then you just need to go to an array specified by epoll to get the corresponding number of file descriptors in turn. The memory mapping (mmap) technique is used here, which completely saves the overhead of copying these file descriptors during system calls.

Epoll uses event-based readiness notification. It registers each file descriptor through epoll_ctrl () in advance. Once a file descriptor is ready, the kernel uses a callback mechanism similar to callback, and is notified when the process calls epoll_wait ().

About the IO model, you can refer to the previous article about epoll written by the author Java NIO.2;, and you can refer to the introduction of select, poll and epoll written by the author.

Sendfile

Most of the time, we ask the server for static files, such as pictures, stylesheets, etc.

When processing these requests, the data of the disk file first passes through the kernel buffer, and then goes to the user memory space without any processing, and it is sent to the kernel buffer corresponding to the network card, and then sent to the network card for transmission.

Linux provides a sendfile () system call that transfers specific parts of disk files directly to the socket descriptor on behalf of the client, speeding up requests for static files while reducing CPU and memory overhead.

Applicable scenario: for static files with small requests, the role of sendfile is not so obvious, because the proportion of time spent in the whole process of sending data is much smaller than that of large file requests.

Memory mapping

The Linux kernel provides a special way to access disk files, which can associate a block of address space in memory with the disk file we specify, so that access to this block of memory is converted into access to disk files. This technique is called memory mapping.

In most cases, memory mapping can improve the performance of disk I mano. Instead of using system calls such as read () or write () to access files, memory is associated with disk files through mmap () system calls, and then files are accessed as freely as memory.

Disadvantages: memory mapping can lead to greater memory overhead when dealing with larger files, and the loss outweighs the gain.

Direct Ipaw O

In linux 2.6, there is no essential difference between memory mapping and direct access to files, because the data needs to be copied twice, between disk and kernel buffers, and between kernel buffers and user-mode memory space.

The purpose of introducing the kernel buffer is to improve the access performance of disk files. However, for some complex applications, such as database servers, in order to further improve the performance, they want to bypass the kernel buffer and implement and manage the Imax O buffer in the user state space. For example, the database can improve the query cache hit rate according to a more reasonable strategy.

On the other hand, bypassing kernel buffers can also reduce the overhead of system memory, because the kernel buffer itself is using system memory.

Linux adds the parameter option O_DIRECT to the open () system call to access the file directly, bypassing the kernel buffer.

In Mysql, for the Innodb storage engine, the cache management of data and indexes is carried out by itself, and the raw partition can be allocated in the my.cnf configuration to skip the kernel buffer to achieve direct Icano.

5. Improve server concurrency strategy

The purpose of the server concurrency strategy is to make the CPU O operation overlap as much as possible, on the one hand, to make CPU not idle while waiting for it, and on the other hand, to let CPU spend as little time as possible on the Imax O scheduling.

A process handles a connection, non-blocking I _ peg O

In this way, when multiple concurrent requests arrive at the same time, the server must prepare multiple processes to process the request. The cost of its process limits the number of its concurrent connections.

However, from the perspective of stability and compatibility, it is relatively safe, the crash of any child process will not affect the server itself, and the parent process can create new child processes; a typical example of this strategy is Apache's fork and prefork mode.

It is possible to choose Apache for sites with low concurrency (such as less than 150) that rely on other Apache features at the same time.

One thread handles one connection, non-blocking IO

This approach allows multiple connections to be processed by multiple threads in a process, and one thread handles one connection. Apache's worker pattern is a typical example of this, allowing it to support more concurrent connections. However, the overall performance of this mode is not as good as that of prefork, so worker mode is generally not chosen. Recommended reading: 14 Java concurrency containers.

One process handles multiple connections, asynchronous Igamot O

A potential prerequisite for a thread to process multiple connections at the same time is to use IO multiplexing readiness notification.

In this case, a process that handles multiple connections is called a worker process or a service process. The number of worker can be configured, such as worker_processes 4 in Nginx.

One thread handles multiple connections, asynchronous IO

Even if there is a high-performance IO multiplexing ready notification, the wait for disk IO is inevitable. A more efficient approach is to use asynchronous IO for disk files, which few Web servers actually support.

6. Another thing to mention about improving the hardware environment is the hardware environment. The hardware configuration of the server is often the most direct and simplest way to improve the performance of the application, which is the so-called scale up. I will not discuss it here. The above is all the contents of the article "how to improve the concurrent processing ability of the server". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report