Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the performance optimization knowledge points in the server?

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

In this article, the editor introduces in detail "what are the performance optimization knowledge points in the server", the content is detailed, the steps are clear, and the details are handled properly. I hope that this article "what are the performance optimization knowledge points in the server" can help you solve your doubts. Let's follow the editor's ideas to learn new knowledge.

When we talk about performance optimization, more students may think of system-level performance optimization. For example, in a Web service program, Redis or other caches are used to improve the speed of website access. There are few optimizations for the program code itself. On the one hand, the compiler has done a lot of optimization work for us, on the other hand, it feels that the optimization effect at the system level is more obvious and more high-end. In fact, in addition to the performance optimization at the system level, the performance optimization at the program code level is also very good.

Don't talk too much nonsense, we speak with the truth. Let's take a look at the following two programs, the function of the two programs is exactly the same, that is, to add one to each element in a two-dimensional array. If you take a look, do you think there will be any performance difference between the two programs? The actual test results show that the performance difference between the two is nearly 4 times.

Analysis of the causes of performance differences

Think about it, why is there such a big performance difference? Combined with the code, we see that the difference between the two pieces of code lies in the order in which the array elements are accessed, the former column by column and the latter line by line. It may be understood more clearly in combination with figure 1. Then, combined with the arrangement rules of two-dimensional data in memory in C language (which can be verified by printing addresses in the above code), we can know that the former accesses a contiguous address space, while the latter accesses a jumping address space.

Figure 1 two forms of access

Take the shaping array as an example, that is to say, the address accessed by the former is X _ Magi, X _ 4, and so on. The address visited by the latter is XMagol 4096 Magi Xuan 8192 in turn. The latter jumps the address space of the 4KB every time.

After understanding the above differences, have you thought of the reasons for the performance differences? We know that in order to improve the performance of accessing memory, CPU adds a cache between it and memory. Modern CPU caches are usually level 3 caches, namely L1, L2 and L3, in which L1 and L2 are unique to CPU cores, while L3 is shared by multiple cores of the same CPU. Its basic architecture is shown in figure 2.

Figure 2 CPU cache architecture

Because of the distributed cache, it is necessary to ensure the consistency among multiple CPU. Going too far, in short, the cache needs to be cut into smaller granularity for management, this small-grained management unit is called cache rows (can be compared to cached pages in the page cache). Because the capacity of the cache is much smaller than the capacity of memory, the cache cannot load all the contents of memory. The main reason for the role of cache is to make use of the two characteristics of conventional business access data, that is, spatial locality and time locality.

Spatial locality: for data that has just been accessed, the probability of adjacent data being accessed in the future is high.

Time locality: data that has just been accessed has a high probability of being accessed in the future.

Knowing the above principle, we know that for the above program code, the second program jumps too far in turn, that is, it does not satisfy the spatial locality, which leads to the cache hit failure. In other words, the second program cannot actually access the data in the cache, but directly accesses the memory. The access performance of memory is much lower than that of cache, so there is a performance difference of nearly 4 times as much as at the beginning of the article.

Other considerations about program performance

Very small changes to our program are likely to have a great impact on performance. Therefore, in our daily development, we should always pay attention to whether there is inappropriate code in the code that causes performance problems. Below we give an example of a performance-related program for reference in future development.

1. Program structure

The impact of unreasonable program structure on performance is sometimes catastrophic. The performance differences between the following two functions can be very large when the string is very long. The function lower1 calculates the length of the string in each loop, which is not necessary. The function lower2 calculates the length of the string before the loop starts, and then determines the condition by a constant variable. The root of the problem lies in the strlen function, which iterates over the length of a string, which can be time-consuming if the string is long.

two。 Procedure (function) call

We know that there are stack-pressing and off-stack operations during procedure calls, which are usually memory operations, and the process is more complex. In other words, the function call process is a more time-consuming operation, minimize function calls.

Fortunately, modern compilers can do a lot of optimization for function calls, and simple function calls can usually be optimized by the compiler. The so-called optimization is that there are no high-level language function calls only at the machine language (assembly language) level.

Let's take a look at a specific example to achieve a simple function call through the C language, where the function fun_1 calls the function fun_2, and the function fun_2 calls printf. Fun_2 doesn't do much work here, just adding the two parameters and passing them to printf.

Figure 3 function call optimization

As shown in the figure, in the case of gcc without any optimization, the disassembled code (the lower left corner of figure 3) shows that the whole logic is very clear, just calling the function step by step. However, after the-O2 optimization, the assembly code becomes very concise (the lower right corner of figure 3). From the assembly code of fun_1, you can see that it does not call fun_2 at all, but calls the printf function directly. Therefore, the compiler can optimize the function call without affecting its function. But this is not absolute, and a slightly more complex function call compiler may be powerless, which can lead to performance loss.

3. Operator difference

The time-consuming differences of different operations are also very large. For example, multiplication takes two or three times as much time as addition, while division takes more than ten times as much time as addition. Therefore, in the logic with high access frequency, reducing the use of division will be significantly improved.

In the HashMap implementation of Java, the Key of a hash is calculated by bit operations rather than modular operations. Because the modular operation itself is a division operation, the performance is more than ten times worse than the bit operation.

Static final int hash (Object key) {int h; return (key = = null)? 0: (h = key.hashCode ()) ^ (h > 16);}

4. Reference and copy

The high-level language that supports classes involves the process of copying when passing object parameters, and copying objects is also a performance-consuming operation. Of course, the high-level language implements the passing of object addresses through a mechanism that becomes a reference, thus avoiding the process of copying (this is the difference between passing values and passing addresses).

After reading this, the article "what are the knowledge points of performance optimization in the server" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it before you can understand it. If you want to know more about related articles, welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report