How to understand software performance optimization 07/09 Update SLTechnology News&Howtos

How to understand software performance optimization

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to understand software performance optimization". The content of the explanation in this article is simple and clear, and it is easy to learn and understand. let's study and learn "how to understand software performance optimization".

Performance optimization means to make the program run faster without affecting the correctness. It is a very extensive topic.

There are a variety of software products, and there are many factors that affect the efficiency of program execution. therefore, performance optimization, especially for unfamiliar projects, is not an easy task.

Performance optimization can be divided into macro and micro levels. The macro level includes architecture refactoring, while the micro level includes algorithm optimization, compilation optimization, tool analysis, high-performance coding and so on. These methods are likely to be independent of specific business logic, so they have more extensive adaptability and are easier to implement.

When it comes to the methodology of performance optimization, first of all, you should establish metrics, what you measure and what you get. Therefore, the performance optimization test first, must be based on data and can not guess out of thin air, which is a basic principle of optimization. Building a real stress test environment, or approaching the real environment, is sometimes difficult and can be very time-consuming, but it is still worth it.

There are many tools that can help us locate the bottleneck of the program, and some tools can do a very friendly graphical presentation. Locating the problem is the prerequisite for solving the problem, but locating the problem may not be the most difficult. Analysis and optimization is the most time-consuming key link. After modification, we have to return to the test to verify whether it is as effective as expected.

What is a high performance program? The structure is broad and far-reaching, and the implementation is meticulous.

The key to architecture optimization is to identify bottlenecks. There are many ways to optimize this kind of optimization, such as distributed modification through load balancing, parallelization with multithreaded co-programming, asynchronization and decoupling with message queues, event notification instead of polling, such as adding cache for data access, such as batch processing + prefetching to enhance throughput, such as IO and logic separation, read-write separation, and so on.

Although architectural adjustment and optimization are very effective, they are not always feasible because they are limited by a variety of practical factors.

What can not be done as much as possible and what must be done efficiently is a fundamental rule of performance optimization. Improving processing capacity and reducing the amount of computation can be regarded as two directions of performance optimization.

How to make the program run faster? This requires us to make full use of the various features of the hardware, find ways to reduce waiting and increase concurrency, improve the CACHE hit rate, and use more efficient structures and algorithms; while reducing the amount of computation may mean jumping out of the purely technical category and looking at which functions are necessary and which are optional from a product and business perspective.

Sometimes we have to improve the program from the dimension of detail. In general, we should use simple data structures and algorithms, but if necessary, we should actively use more efficient structures and algorithms, not only logical structures, but also physical structures (implementation) affect execution efficiency; the effects of branch prediction, feedback optimization, heuristic and machine learning-based compilation optimization are becoming increasingly prominent; proficiency in programming languages and deep understanding of standard library implementation can help us avoid the trap of low performance. Code fine-tuning and even instruction-level optimization can sometimes achieve unexpected results.

Sometimes we need to do some swaps, such as trading space for time, such as sacrificing some universal readability for high performance, which we should only do when very necessary, which is the art of tradeoff.

# # 1. Architecture optimization

# load balancing

In fact, load balancing is to solve a problem of sharing activities. When it comes to distributed systems, a load balancer is usually placed in front of the logical server. For example, NGINX is a classic solution. Load balancing is not limited to distributed systems. For servers with multi-thread architecture, it is also necessary to solve the problem of load balancing so that the load of each worker thread can be balanced.

# Multithreading and parallelization of cooperative programs

Although the complexity of the hardware architecture puts forward higher requirements for program development, writing programs that make full use of the multi-CPU and multi-core features can achieve amazing benefits. therefore, under the same hardware specifications, the parallel transformation based on multi-threading / co-programming is still worth trying.

Multithreading will inevitably face the problem of resource competition, our design goal should be to make full use of the advantages of hardware multi-execution core, reduce waiting, and make multiple execution run smoothly and fast.

For the multithreaded model, if each work to be done is abstracted as a task and the work thread is abstracted as a worker, then there are two typical design ideas, one is to divide the task type and let a class or a worker do a specific task, and the other is to let all worker do all the task.

The first kind of partition can reduce data contention, coding implementation is also easier, only need to identify limited competition, can make the system work well, the disadvantage is that the workload of tasks is likely to be different, may lead to some worker busy and others idle.

The second kind of partition has the advantage of balance and the disadvantage of high coding complexity and more data competition.

Sometimes, we combine the above two modes, for example, let a separate thread do IO (transceiver packet) + deserialization (generate protocol task), and then start a batch of worker threads to process the package, which is connected through a task queue. This is the classic producer-consumer model.

Co-program is a kind of multi-execution flow in user mode, which is based on the assumption that the task switching cost in user mode is lower than that in system thread switching.

# Notification alternative polling

Polling means constantly asking, just like you go to the hostel every few minutes to check if there is a letter, and the notice is that you tell the hostess that when you have a letter, she calls to inform you, obviously polling consumes CPU, and the notification mechanism is more efficient.

# add cache

The theory of cache is based on the principle of locality.

In general, write requests are much less than read requests, so it is very suitable to introduce cache clusters in the scenario of writing less and reading more.

When writing to the database, write a piece of data to the cache cluster at the same time, and then use the cache cluster to carry most of the read requests, because the cache cluster is easy to achieve high performance, so, by caching the cluster, you can host higher concurrency with fewer machine resources.

The hit rate of cache is generally very high, and the speed is very fast, and the processing power is also strong (tens of thousands of concurrency can be easily achieved on a single machine), so it is an ideal solution.

CDN is essentially a cache, and it is a common practice that static resources accessed by users are cached in CDN.

# message queue

Message queuing and message middleware are used to asynchronize write requests. When we write the data into MessageQueue, we think that the write is complete, and the MQ writes slowly to DB, which can cut the peak and fill the valley.

Message queuing is also a means of decoupling, which is mainly used to solve the pressure of writing.

# Separation of IO from logic and separation of reading and writing

The separation of IO from logic has been discussed earlier. Read-write separation is a common way for databases to cope with stress, and of course, it is not limited to DB.

# batch processing and data prefetching

Batch processing is an idea, which is divided into many kinds of applications, such as batch processing of multiple network packets, which means that the received packets are saved together and then go through the process together, so that a function is called many times, or a piece of code is executed many times, so that the locality of i-cache is very good. In addition, if the function or the data to be accessed in a segment is accessed many times, the locality of d-cache can also be improved. It naturally improves performance, and batch processing increases throughput, but usually increases latency.

Another application of batch processing is log storage. For example, a log can write dozens of bytes. We can cache it and save enough to write to disk for better performance, but it also brings the risk of data loss, but usually we can avoid this risk through shm.

Instruction prefetching is completed automatically by CPU, and data prefetching is a very skillful work. The basis of data prefetching is that the prefetched data will be used in the next operation, which accords with the principle of spatial locality. Data prefetching can fill the pipeline and reduce memory access waiting, but data prefetching will infringe on the code and is not always as effective as expected.

Even if you do not add prefetching code, the hardware prefetcher may help you do prefetching. In addition, gcc also has a compilation option. Turning it on will automatically insert prefetching code during the compilation phase. Manually adding prefetching code requires careful handling, and the timing is very important. Finally, it must be based on test data. In addition, even if the performance is good, code modification may lead to effect attenuation, and prefetching statement execution itself also has overhead. It is worth it only if the benefit of prefetching is greater than the cost of prefetching.

Tired, can not write la la! Write the following chapter sometime.

# # 2. Algorithm optimization

# Hash (HASH)

# compare hash and string

# HashMap

# comparison of hash and balanced search tree

# binary search based on ordered array

# Optimization of data structure

# delay calculation & copy while writing

# pre-calculation

# incremental updates

# # 3. Code optimization

# memory Optimization

Small object allocator

Separation of memory allocation and object construction

# cache optimization

I-cache optimization, d-cache optimization, cache alignment, structural weight arrangement

# judging the prefix

# replacing Mini Operations with holistic Operations

# reuse

# subtraction

# reduce redundancy

# reduce copy and zero copy

# reduce the number of parameters (register parameters, depending on the ABI convention)

# reduce the number of function calls / levels

# reduce the number of storage references

# reduce invalid initialization and repeated assignment

# Loop Optimization

# defensive programming is enough

# Clean release

# use Recursion carefully

# # 4. Compiler optimization

# inline

# restrict

# LTO

# PGO

# Optimization options

# # 5. Other optimizations

# binding Core

# SIMD

# Lock and concurrency

# granularity of locks

# Lock-free programming

# Per-cpu data structure & thread local

# memory barrier

Summary

Performance optimization is a meticulous work, engineers have been committed to finding shortcuts to solve performance problems once and for all, but unfortunately, there is no silver bullet, but this does not mean that performance optimization is random. Software engineers have accumulated a lot of experience in performance optimization, including architecture, caching, prefetching, tools, compilers and programming languages, code refactoring and other practical experience, these methods and discussions are of reference significance.

Performance optimization is also a systematic project, and the emergence of performance bottleneck and then optimization is a way of pollution first and then treatment. A better way is to run the performance throughout the whole life cycle of the software, consider performance as a requirement or even a key goal at the beginning of the design, continuously monitor the change of performance and strictly follow the high-performance coding specification in the development, and bring the performance into the maintenance system in the later maintenance.

Strictly speaking, performance optimization is different from performance design. Performance optimization usually makes improvements on the basis of existing systems and code. It does not start all over again, but tests the developer's ability to reverse repair. Performance design tests the designer's forward design ability, but performance optimization methods can guide performance design, and the two complement each other.

Thank you for your reading. the above is the content of "how to understand Software performance Optimization". After the study of this article, I believe you have a deeper understanding of how to understand software performance optimization. the specific use situation also needs to be verified by practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.