How to optimize LinuxCPU when it reaches the bottleneck 04/16 Update SLTechnology News&Howtos

How to optimize LinuxCPU when it reaches the bottleneck

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article is to share with you about how to optimize LinuxCPU to reach the bottleneck, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

In Linux system, due to the limitation of cost, there is often a shortage of resources, such as CPU, memory, network, IO performance. In this paper, the principles of Linux process and CPU are analyzed, and the methods of CPU performance optimization are summarized.

1. Means of analysis

Before you understand the average load, you need to understand the state of the process under Linux.

1.1. Process status

1.1.1. R (TASK_RUNNING), executable statu

Only processes in this state can run on CPU. At the same time, there may be multiple processes in the executable state, and the task_struct structure (process control block) of these processes is put into the executable queue of the corresponding CPU (a process can only appear in the executable queue of one CPU at most). The task of the process scheduler is to select a process from the executable queue of each CPU to run on that CPU.

Many operating system textbooks define processes executing on CPU as RUNNING states, while executable processes that have not been scheduled for execution are defined as READY states. These two states are unified as TASK_RUNNING states under linux.

1.1.2. S (TASK_INTERRUPTIBLE), interruptible sleep state

A process in this state is suspended because it is waiting for something to happen (such as waiting for a socket connection, waiting for a semaphore). The task_struct structures of these processes are placed in the waiting queue for the corresponding events. When these events happen,

(triggered by an external interrupt or by another process), one or more processes in the corresponding waiting queue will be awakened. We can see from the ps command that, in general, the vast majority of processes in the process list are in the TASK_INTERRUPTIBLE state (unless the machine is heavily loaded). After all, there are only one or two CPU, and there are often dozens or hundreds of processes. If not most of the processes are sleeping, how can CPU respond.

1.1.3. D (TASK_UNINTERRUPTIBLE), uninterruptible sleep state

Similar to the TASK_INTERRUPTIBLE state, the process is asleep, but at this point the process is uninterruptible.

Uninterruptible does not mean that the CPU does not respond to interrupts from external hardware, but that the process does not respond to asynchronous signals.

In most cases, when a process is asleep, it should always be able to respond to asynchronous signals. Otherwise you will be surprised to find that kill-9 cannot kill a sleeping process! So it's easy to understand why the processes seen by the ps command rarely have a TASK_UNINTERRUPTIBLE state, but always have a TASK_INTERRUPTIBLE state.

The significance of the existence of the TASK_UNINTERRUPTIBLE state is that some of the kernel processes cannot be interrupted. If you respond to an asynchronous signal, the execution flow of the program is inserted into a process for processing the asynchronous signal (which may only exist in the kernel state or may extend to the user state), and the original process is interrupted.

(see "linux Kernel Asynchronous interrupt Analysis") when the process operates on some hardware (for example, the process calls the read system call to read a device file, and the read system call eventually executes the code of the corresponding device driver and interacts with the corresponding physical device), the process may need to be protected by TASK_UNINTERRUPTIBLE state to prevent the process from being interrupted when interacting with the device. Causes the equipment to fall into an uncontrollable state. The TASK_UNINTERRUPTIBLE state in this case is always very short, and it is almost impossible to capture it through the ps command.

1.1.4. T (TASK_STOPPED or TASK_TRACED), paused or tracked statu

Send a SIGSTOP signal to the process, and it enters the TASK_STOPPED state in response to the signal (unless the process itself is in the TASK_UNINTERRUPTIBLE state and does not respond to the signal). (SIGSTOP, like SIGKILL signals, is very mandatory. User processes are not allowed to reset the corresponding signal handling functions through signal series system calls.)

Send a SIGCONT signal to the process to restore it from the TASK_STOPPED state to the TASK_RUNNING state.

When a process is being tracked, it is in the special state of TASK_TRACED. "being tracked" means that the process pauses and waits for the tracking process to operate on it. For example, at the next breakpoint for a tracked process in gdb, the process is in a TASK_TRACED state when it stops at the breakpoint. At other times, the process being tracked is still in the previously mentioned state.

For the process itself, the TASK_STOPPED and TASK_TRACED states are similar, indicating that the process is paused. The TASK_TRACED state is equivalent to an extra layer of protection on top of the TASK_STOPPED, and the process in the TASK_TRACED state can not be awakened in response to the SIGCONT signal. Only when the debugging process executes PTRACE_CONT, PTRACE_DETACH and other operations through the ptrace system call (specified by the parameters specified by the ptrace system call), or the debugging process exits, the debugged process can restore the TASK_RUNNING state.

1.1.5. Z (TASK_DEAD-EXIT_ZOMBIE), exit status, the process becomes a zombie process

The process is in a TASK_DEAD state while exiting. During this exit process, all resources occupied by the process are reclaimed, except for the task_struct structure (and a few resources). So task_struct is the only shell left in the process, so it is called a zombie. Task_struct is retained because the task_struct holds the exit code of the process, as well as some statistical information. And the parent process is likely to care about this information. For example, in shell, $? Variable holds the exit code of the last exit foreground process, and this exit code is often used as the judgment condition of the if statement.

Of course, the kernel can also store this information somewhere else and release the task_struct structure to save some space. But using the task_struct structure is more convenient because a lookup relationship from pid to task_struct has been established in the kernel, as well as a parent-child relationship between processes. To release task_struct, you need to establish some new data structures so that the parent process can find the exit information of its child process.

The parent process can wait for the exit of one or some child processes and obtain its exit information through the system calls of the wait series (such as wait4, waitid). Then the system call of the wait series releases the task_struct of the child process by the way.

When a child process exits, the kernel sends a signal to its parent process, informing the parent process to "collect the corpse". This signal defaults to SIGCHLD, but can be set when a child process is created through the clone system call.

1.2. Average load

The average number of runnable and uninterruptible processes per unit time, that is, the average number of active processes, is not directly related to CPU utilization.

Since it is the average number of active processes, ideally there is just one process running on each cpu, so that each cpu is fully utilized, for example, what does it mean when the average load is 2?

1. On a system with only 2 CPU, it means that all CPU happens to be fully occupied

2. On a system with 4 CPU, it means that CPU has 50% idle

3. On a system with only one CPU, it means that half of the processes are less than CPU

1.2.1. How much is the average load reasonable?

Ideally, the average load is equal to the number of CPU. The command to view the system CPU is as follows:

Cat / proc/cpuinfo

Command for Figure 1 quad-core CPU to view average load:

It gives us the average value of three different time intervals, which provides us with the data source to analyze the load trend of the system, so that we can understand the current load situation more comprehensively and stereoscopically.

The three values of 1 minute, 5 minutes and 15 minutes are basically the same, or there is little difference, indicating that the system load is very flat.

If the value of 1 minute is much less than the value of 15 minutes, the load of the system has decreased in the last 1 minute, while in the past

There's a lot of load in 15 minutes.

If the value of 1 minute is much greater than the value of 15 minutes, the load has increased in the last 1 minute. Once the average load of 1 minute approaches or exceeds the number of CPU, it means that the system is overloaded.

The uptime command is cut down on some embedded devices, but can be obtained through the proc file system. Command:

Cat / proc/loadavg

Obviously, the average load shown by the current command is overloaded when CPU is 4

1.2.2. Average load and CPU utilization

The average load includes not only the processes that are using CPU, but also the processes waiting for CPU and waiting for Imax O.

CPU utilization refers to the statistics of CPU busy per unit time, which does not necessarily correspond to the average load. For example:

CPU-intensive processes, the use of a large amount of CPU will lead to an increase in average load, when the two are consistent

CPU O-intensive process, waiting for it will also lead to an increase in average load, but the usage rate of IMAX is not necessarily very high.

The scheduling of a large number of processes waiting for CPU will also lead to an increase in average load, and the CPU utilization will also be relatively high.

1.3. CPU context switching

Before each task runs, CPU needs to know where the task is loaded and where it starts to run, that is to say, the system needs to set CPU registers and program counters (Program Counter, PC) for it in advance.

CPU registers: small but extremely fast memory built into CPU.

Program counter: is used to store the location of the instruction being executed by CPU, or the location of the next instruction to be executed.

They are all dependent environments of CPU before running any task, for example, so they are also called CPU contexts.

Context switching: first save the CPU context of the previous task (that is, CPU registers and program counters), then load the context of the new task into these registers and program counters, and finally jump to the new location referred to by the program counter to run the new task.

Context switching of CPU can be divided into process context switching, thread context switching and interrupt context switching.

1.3.1. Process context switching

According to the privilege level, Linux divides the running space of the process into kernel space and user space.

Kernel space (Ring 0) has the highest permissions and can access all resources directly.

User space (Ring 3) can only access restricted resources, can not directly access hardware devices such as memory, and must be trapped into the kernel through system calls in order to access these privileged resources.

1.3.2. The difference between process context switching and system calls

The process is managed and scheduled by the kernel, and the process switching can only occur in the kernel state. Therefore, the context of the process includes not only the user space resources such as virtual memory, stack and global variables, but also the state of kernel space such as kernel stack and register.

In the process of system call, process user resources such as virtual memory are not involved, and processes are not switched.

Process context switching refers to switching from one process to another.

The same process is always running during the system call.

Therefore, the context switching of the process is one more step than that of the system call: the virtual memory and stack of the current process need to be saved before saving the kernel state and CPU register of the current process, and the virtual memory and user stack of the process need to be refreshed after loading the kernel state of the next process.

1.3.3. When will the process be switched above

When the process terminates, the CPU it used before will be released, and a new process will be run from the ready queue.

When the time slice of a process is exhausted, it will be suspended by the system and switch to another process waiting for CPU.

When the process is short of system resources (such as insufficient memory), it can not run until the resources are satisfied, at which time the process will be suspended and other processes will be scheduled to run.

When a process suspends itself actively through a method such as the sleep function sleep, it will naturally be rescheduled.

When there is a higher priority process running, in order to ensure the running of the high priority process, the current process will be suspended and run by the high priority process.

When a hardware interrupt occurs, the process on the CPU is suspended and the interrupt program service in the kernel is performed instead.

1.3.4. Thread context switch

The difference between threads and processes

The thread is the basic unit of scheduling, while the process is the basic unit of resource ownership.

When a process has only one thread, it can be considered that the process is equal to the thread.

When a process has multiple threads, these threads share resources such as virtual memory and global variables. These resources do not need to be modified during context switching.

Threads also have their own private data, such as stacks and registers, which also need to be saved during context switching.

There are two cases of context switching of threads

The front and back threads belong to different processes. At this point, because resources are not shared, the switching process is the same as process context switching.

The front and back threads belong to the same process. At this point, because the virtual memory is shared, the virtual memory resources remain unchanged when switching, and only the private data, registers and other unshared data of the thread need to be switched.

1.3.5. Interrupt context switch

Interrupt processing interrupts the normal scheduling and execution of the process. When interrupting other processes, the current state of the process needs to be saved, and after the interruption is over, the process can still resume running from its original state.

The difference between process context switching and interrupt context switching

Interrupt context switching does not involve the user state of the process. Therefore, even if the interrupt process interrupts a process that is in user mode, there is no need to save and restore the process's virtual memory, global variables and other user-state resources. In fact, the interrupt context only includes the necessary state for the kernel interrupt service program to execute, including CPU registers, kernel stack, hardware interrupt parameters and so on.

For the same CPU, interrupt handling has a higher priority than the process.

The similarities between process context switching and interrupt switching above

All need to consume CPU, too many switching times will consume a lot of CPU, and even seriously reduce the overall performance of the system.

1.3.6. Summary of CPU context switching

CPU context switching is one of the core functions to ensure the normal operation of the Linux system, which generally does not require our special attention.

However, too much context switching will waste CPU time on the preservation and recovery of data such as registers, kernel stack and virtual memory, thus shortening the real running time of the process, resulting in a significant decline in the overall performance of the system.

1.3.7. How to view the context switch of the system

Commonly used system performance analysis tools, mainly used to analyze the memory usage of the system, but also commonly used to analyze

Number of CPU context switches and interrupts.

Figure 2 outputs a set of data every 2 seconds

Four items that require special attention:

Cs (context switch): the number of context switches per second.

In (interrupt): number of interrupts per second.

R (Running or Runnable): the length of the ready queue, that is, the number of processes running and waiting for CPU.

B (Blocked): the number of processes in an uninterruptible sleep state.

In embedded Linux devices, general vmstat tools do not exist. So if you want the vmstat tool, you can implement the code yourself, and its principle is to obtain the information of / proc/diskstats and / proc/slabinfo. For the implementation code, please see procps tool.

Vmstat only gives the overall context switching of the system, and can not view the context switching of each process.

View the context switching of threads in a process. The following figure shows the context switching of all threads in the hicore process.

Focus on two columns:

1. Voluntary context switching: context switching caused by the process's inability to get the resources it needs. For example, when the system resources such as Icano, memory and so on are insufficient.

two。 Involuntary context switching: a context switch that occurs when a process is forced to be scheduled by the system because the time slice has expired and other reasons. For example, when a large number of processes are competing for CPU.

1.3.9. Procps tool

Procps is a set of command-line and full-screen tools, a "pseudo" file system dynamically generated by the kernel that provides information about the status of items in the process table. The file system provides a simple interface to the kernel data structure, and procps programs usually focus on the data structure that describes the running state of the system process.

Procps includes the following programs:

Free-reports the amount of available and used memory in the system

Kill-sends a signal to a process based on PID

Pgrep-lists processes by name or other attribute

Pkill-signals the process based on name or other attributes

Pmap-memory mapping of the report process

Ps-report process information

Pwdx-current directory of the reporting process

Skill-obsolete version of pgrep/pkill

Slabtop-Real-time display of kernel slab cache information

Snice-Renice a process

Sysctl-read or write of kernel parameters at runtime

Tload-Visualization of system load mean

Top-A real-time dynamic view of a running process

Uptime-displays the elapsed time and load of the system

Vmstat-reports virtual memory statistics

W-report on logged-in users and what they are doing

Watch-regular execution of the program, showing full screen output on the official website address: http://procps.sourceforge.net/

1.3.10. Sysstat tool

The tool also does not exist in embedded Linux devices, and there are no commands in busybox. You need to install the Linux performance monitoring tool sysstat, which is a set of tools, including sar, sadf, mpstat, iostat, pidstat, and so on, which can monitor system performance and usage. The functions of each tool are as follows:

1. Iostat-provides CPU statistics and stores Icano statistics (disk devices, partitions and network file systems)

2. Mpstat-provides statistics related to single or combined CPU

3. Pidstat-provides Linux process-level statistics: I CPU, memory, etc.

4. Sar-collect, report and save system activity information: CPU, memory, disk, interrupt, network interface, TTY, kernel table, etc.

5. Sadc-system activity data collector for use as sar backend

6. Sa1-collects daily data on system activities and stores it in binary format. As the front end of the sadc tool, it can be called through cron

7. Sa2-generates daily activity reports of the system, which can also be used as the front end of the sadc tool and can be called through cron

8. Sadf-the performance data collected by sar can be displayed in CSV, XML format, etc., so that it is very convenient to import the system data into the database or into Excel to generate charts

9. Nfsiostat-sysstat: provide NFS Iamp O statistics

10. Cifsiostat: provide CIFS statistics

The function of sysstat is powerful, and the function is constantly being enhanced. Each version provides different functions. You can go to the sysstat official website to release the first information and get the corresponding helper book. Official website address:

1.3.11. Interrupt

Interrupt is an asynchronous event handling mechanism, which can improve the concurrent processing ability of the system. Interrupt handlers will interrupt the operation of other processes. In order to reduce the impact on the scheduling of normal processes, interrupt handlers need to run as fast as possible.

Linux divides the interrupt handling process into two stages, namely, the upper part and the lower part.

The upper part is used to deal with interrupts quickly, it runs in interrupt forbidden mode, and mainly deals with hardware-related or time-sensitive work.

The lower half is used to defer the processing of the upper half of the unfinished work, usually running as a kernel thread.

/ proc/interrupts: view the types of hard interrupts that occur

Hardware interrupts occur frequently, which consumes CPU resources. By default, Linux binds all hardware interrupts to CPU0. Under the condition of multi-core CPU, if there is a way to allocate a large number of hardware interrupts to different

CPU (core) processing obviously has a good balance performance.

1.3.13. Make a specific analysis according to the type of context switching

Voluntary context switching has increased, indicating that processes are waiting for resources, and other problems such as Icano may occur.

More involuntary context switches indicate that processes are being forced to schedule, that is, all processes are competing for CPU, indicating that CPU has indeed become a bottleneck.

The number of interrupts increases, indicating that the CPU is occupied by the interrupt handler, and the specific interrupt type needs to be analyzed by looking at the / proc/interrupts file.

1.4. CPU utilization rate

/ proc/stat, which provides system CPU and task statistics. This information is very primitive.

Important indicators related to CPU utilization

The first column: user (us), which represents the user-mode CPU time.

The second column: nice (ni), which represents the low priority user mode CPU time, that is, the nice value of the process is adjusted to 1-

The CPU time between 19. Available values for nice range from-20 to 19. The higher the value, the lower the priority.

The third column: system (sys), which represents the CPU time of the kernel state.

The fourth column: idle (us), which represents idle time. Note that it does not include the time it takes to wait for Imax O (iowait) here.

The fifth column: iowait (wa), which represents the CPU time waiting for Imax O.

The sixth column: irq (hi), which represents the CPU time for handling hard interrupts

The seventh column: softirq (si), which represents the CPU time for processing soft interrupts.

The real command to view CPU usage is through the top command

1.5. Soft interrupt

Provides the operation of soft interrupts.

Note the type of soft interrupt, which is the content of the first column.

Note the distribution of the same soft interrupt on different CPU, that is, the same line of content.

Soft interrupts actually run as kernel threads, and each CPU corresponds to a soft interrupt kernel thread, which is called the ksoftirqd/CPU number.

two。 Optimization method

2.1 CPU utilization

CPU utilization describes the percentage of non-idle time in total CPU time. According to the difference of tasks running on CPU, it is divided into user CPU, system CPU, waiting for CPU O CPU, soft interrupt and hard interrupt and so on.

User CPU utilization, including user-mode CPU utilization (user) and low-priority user-mode CPU utilization (nice), represents the percentage of time that CPU is running in user mode. Users have a high rate of CPU usage, which usually indicates that some applications are busy.

System CPU utilization, which represents the percentage of time that CPU is running in the kernel state (excluding interrupts). The CPU utilization of the system is high, indicating that the kernel is busy.

The CPU usage that waits for iowait O, often referred to as iowait, represents the percentage of time it takes to wait for it. The iowait is high, which usually indicates that the interaction time between the system and the hardware equipment is relatively long.

The CPU utilization of soft interrupts and hard interrupts represents the percentage of time that the kernel calls soft interrupt handlers and hard interrupt handlers, respectively. Their high utilization rate usually indicates a large number of interruptions in the system.

2.2 average load (Load Average)

The average load, that is, the average number of active processes of the system, reflects the overall load of the system and consists of three values, which refer to the average replicators of the past 1 minute, 5 minutes, and 15 minutes, respectively.

Ideally, the average load is equal to the number of logical CPU, which means that each CPU happens to be fully utilized. If the average load is greater than the number of logical CPU, it means that the load is heavy.

2.3 process context switching

Voluntary context switching caused by inability to get resources.

Involuntary context switching caused by forced scheduling by the system.

2.4 hit rate of CPU cache

Because the development of CPU is much faster than that of memory, the processing speed of CPU is much faster than the access speed of memory. In this way, when CPU accesses memory, it is inevitable to wait for a response from memory. In order to reconcile the huge performance gap between the two, CPU caching (usually multi-level caching) has emerged.

According to the growing hot spot data, these caches are divided into L1, L2, and L3 caches according to their size.

L1 and L2 are commonly used in single cores, while L3 is used in multicore.

From L1 to L3, the size of the three-tier cache increases in turn, and the performance decreases accordingly (of course, it is better than memory.

Much more). Their hit ratio measures the reuse of CPU cache, and the higher the hit rate, the better the performance.

2.5 tcmalloc to replace ptmalloc

2.5.1 ptmalloc

Ptmalloc adopts the master-slave allocation area mode. When a thread needs to allocate resources, it finds an unlocked allocation area from the linked list and allocates memory.

Small memory allocation

Within ptmalloc, memory blocks are managed by chunk, and chunk of similar size is managed by a linked list, which is called a bin. In the first 64 bin, the difference in the size of the chunk in the adjacent bin is 8 bytes, which is called small bin, and the chunk in the large bin,large bin is arranged in the order of first size, then most recently used, and each allocation finds the smallest chunk that can be used.

The structure of Chunk is shown above. Bit An indicates whether it is in the main distribution area, M indicates whether it comes from mmap, P indicates whether the chunk next to the memory is in use, and if not, size of previous

Chunk is the size of the previous chunk, otherwise it is meaningless (and is used as allocated memory), officially based on

The P flag bit and size of previous chunk are merged by chunk when the free memory block is used. Of course, if the chunk is idle and some pointers are recorded in the mem to index the adjacent size chunk, the implementation principle will not be repeated, as long as you know the general effect.

During the free, ptmalloc checks the nearby chunk and tries to merge the continuously idle chunk into one large chunk and put it into the unstored bin. But when a very small chunk is released, ptmalloc incorporates it into the fast bin. Similarly, at some point, contiguous blocks of memory in fast bin are merged and added to a unsorted bin before entering a normal bin. So when malloc has a small memory, look for fast first.

Bin, then look for unsorted bin, and finally look for a normal bin. If the chunk in unsorted bin is not appropriate, it will be thrown into bin.

Large memory allocation

Ptmalloc also has a top chunk at the top of the allocated memory. If the free chunk in the previous bin is not enough to meet the needs, try to allocate memory from the top chunk. If there is not enough in top chunk, you will have to take it from the operating system.

In addition, a very large amount of memory will come out directly from the system mmap and will not be managed by chunk, so when the memory is recovered, it will be munmap back to the operating system.

In short, it is:

Small memory: [get arena and lock]-> fast bin-> unsorted bin-> small bin-> large bin

-> top chunk-> extended heap

Large memory: direct mmap

Summary

When released, it is almost the reverse of allocation, plus some chunk merging and transfer operations from one bin to another bin. And if there is a large enough free chunk at the top, shrink the heap top and return it to the operating system.

For this reason, there are several considerations for the use of memory allocation for ptmalloc:

1. Ptmalloc defaults to releasing memory after allocating memory, because memory collection starts with top chunk.

two。 Avoid frequent allocation and release of memory by multithreads, which will result in frequent unlocking.

3. Do not allocate memory blocks with a long life cycle, which can easily cause internal fragmentation and affect memory recovery.

2.5.2 Tcmalloc

The specific implementation principle will not be repeated, you can learn it from Baidu and summarize the following characteristics.

Tcmalloc takes up less extra space. For example, allocating N 8-byte objects might use about 8N * 1.01

Bytes of space. That is, use 1% more space. Ptmalloc2 uses at least 8 bytes to describe a chunk.

Faster. Small objects have almost no locks, and objects with > 32KB are assigned to use spin locks from the CentralCache. And > 32KB objects are page face alignment allocation, multithreading should try to avoid frequent allocation, otherwise it will also cause spin lock competition and page alignment caused by waste.

2.6 mind mapping

3. Analysis tool

Starting from the performance index of CPU. When you want to look at a performance metric, you need to know which tools can do it.

4. Train of thought

Performance optimization is not without side effects. In general, Linux systems do not need to adjust some indicators. Often, performance optimization will lead to an increase in the complexity of the overall system, reducing portability, and may also lead to other indicators abnormal when adjusting one indicator.

Not all performance problems need to be optimized, and bottleneck points need to be optimized. For example, there is a bottleneck in the current system, and the user CPU utilization has increased by 10%, while the system CPU utilization has increased by 50%. At this time, the system CPU utilization should be optimized first.

4.1 Application Optimization

From an application perspective, the best way to reduce CPU usage is to eliminate all unnecessary work and retain only the core logic. For example, reduce loop levels, reduce recursion, reduce dynamic memory allocation, and so on.

Several common application performance optimization methods:

Compiler optimization: many compilers provide optimization options, turn them on properly, and you can get help from the compiler during the compilation phase to improve performance. At present, the optimization option for the device is Os, which is equivalent to O2.5.

Algorithm optimization: the use of asynchronous processing can prevent the program from blocking because it is waiting for a certain resource, thus improving the concurrent processing ability of the program.

Multi-thread instead of multi-process: context switching of threads does not switch process address space, so the cost of context switching can be reduced. At present, the equipment adopts multi-thread mode.

Using buffer: frequently accessed data can be cached in memory so that it can be obtained directly from memory the next time it is used, speeding up the processing speed of the program.

Small memory use: small memory applications, in the case of ensuring that the stack space does not overflow, try to use stack applications, less use of dynamic memory applications to improve the efficiency of the program.

4.2 system optimization

From a system point of view, to optimize the operation of CPU, on the one hand, we should make full use of the locality of CPU cache to accelerate cache access; on the other hand, we should control the CPU usage of processes and reduce the interaction between processes.

Common methods:

CPU binding: binding a process to one or more CPU can improve the hit rate of CPU cache and reduce context switching problems caused by cross-CPU scheduling.

Priority adjustment: use nice to adjust the priority of the process, with positive values lowering the priority and negative values raising the priority.

Interrupt load balancing: whether soft or hard interrupts, their interrupt handlers may consume a lot of CPU.

By configuring smp_affinity, the interrupt handling process can be automatically load balanced to other CPU

Replace ptmalloc: the currently used gblic library whose dynamic memory management is ptmalloc. Current applications use a lot of small dynamic memory, and you can use tcmalloc, which is faster on small memory and almost unlocked.

The above is how to optimize LinuxCPU to reach the bottleneck. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.