Case Analysis of linux performance Optimization 10/25 Update SLTechnology News&Howtos

Case Analysis of linux performance Optimization

2025-10-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "linux performance optimization case analysis". Many people will encounter such a dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Since the last modification of backlog, the IO capability of Silly has been lagging behind redis by a small margin (about 40.6K), but the reason has not been found.

I'm going to do profile from scratch this time to find out what the problem is.

First use the gprof provided by GNU to analyze whether the C code is worth optimizing, and it is found that the highest utilization of CPU is within luaVM and malloc/free.

All of our business logic is done in the lua layer, and the IO thread interacts with the worker (lua) thread through malloc. This almost shows that the C code has little room for optimization.

But the good news is that if gprof doesn't go to profile system calls and so, there may still be a chance.

For stress testing, first use top to look at the utilization of each cpu core, including user mode and kernel state.

The high user state is generally consumed by the application logic code, while the kernel state may be due to too many system calls, frequent context switching, and other reasons.

However, top can only see the cpu status at that time, it is not good to see the entire test interval, the curve of cpu consumption. Instead, you can use 'sar-u-P ALL 1' to print out cpu usage every other second.

Through observation, it is found that the kernel state of one thread is as much as 70%, which is much higher than that of Redis.

Using the vmstat command to check in (system interrupt) / cs (context switching), you can confirm that in and cs increase significantly throughout the stress test interval, which is presumed to be caused by the system call.

To further confirm that these 'interrupts and context switches' are caused by Silly, use 'pidstat-w-p PID 1' to print out the context switching frequency of a thread.

When confirmed, use 'strace-p $PID-c-f' to collect all system calls for this process. Then targeted optimization according to the information collected.

If all the above is done, there is still no room for optimization. It doesn't matter, we also have an artifact perf to view Cache hit rate, branch forecast failure rate, CPU scheduling migration and other information closely related to cpu.

If all of the above has been done and no optimization space has been found.

Another common but easily overlooked factor is that the user state and kernel state of CPU are very low.

In this case, there is usually a queue between programs or clusters (this queue may be all the facilities with FIFO nature, such as socket), and one end of the queue is too slow to process (for example, for some reason, the processor is stuck without consuming cpu). After generating the request, the queue generator has been in the idle because it has not received a response.

The whole performance looked very strange, as if suddenly the machine was empty.

Finally, when we find that the application layer code can't be optimized, don't worry, there may be the last few free lunches you haven't eaten yet.

Jemalloc

An excellent memory allocator that performs well even for multithreading. Take this optimized silly as an example, after changing the memory allocator to jemalloc 5.0, the request processing speed has been significantly improved.

_ _ builtin_expect

The GNU built-in function can be used to hint to GCC which branch is more likely to be executed so that GCC can generate better code to facilitate CPU to make branch prediction. When our branches judge that there is a significant difference in the probability of success and failure (such as exception handling), it can be used to improve performance, depending on the situation. The test of one of these situations is shown in the previous section.

Cpu affinity

The linux kernel provides interfaces to applications that allow us to fine-tune the kernel, including scheduling algorithms. Cpu affinity can modify the cpu affinity of a process or thread. To imply that the kernel may at least be selected for cpu migration, cpu migration data can be obtained through the perf tool.

This is the end of the content of "linux performance Optimization case Analysis". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.