Performance optimization = change the code? 07/03 Update SLTechnology News&Howtos

Performance optimization = change the code?

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

If you see my article for the second time, welcome to scan the code at the end of the article and follow me.

It arrives on time at 11:45 every Friday. Of course, I will add a meal from time to time.

My "124" original tribute.

Hello, everyone. I'm Brother Z.

I haven't written a technical article for a long time. Recently, I happened to think about it and write it out and share it with you.

On a certain scale of the system, especially the To C system, performance optimization will be more or less forced to do. Otherwise, the system will not be able to support the development of the business, technology has become a drag, not to lead the business.

Once there is a performance problem online, it can be tricky. Because it is different from Bug in business function, the analysis and solution of the latter is clearer, as long as the log is in place, along a known business logic line, it is easy to find the root cause of the problem.

The performance problem will be much more complex, resulting in many factors, and even the result of the joint action of a variety of factors. For example, the code quality is low, the business development is too fast, the architecture design is unreasonable and so on.

And in general, performance problems are time-consuming to deal with, and the analysis links involved may be very long, especially in the upstream and downstream systems outside their own team, many people are unwilling to do it, or powerless. At most, use some temporary remedies to try your luck. For example, increase the capacity of the machine, restart the big move, …... no, no, no.

Some temporary remedial measures, sometimes not only can not solve the problem, but also bury new hidden dangers.

For example, it appears that a program has a performance problem due to insufficient resources. Temporarily adding more resources to it may, on the face of it, the problem has been solved. But in fact, it may be because the use of resources within the program is unreasonable, increasing resources is only to delay the occurrence of problems, but also may encroach on the running resources of other programs.

In order to avoid falling into such a dilemma, we should try to optimize the performance in advance and make preparations in advance. It's even better to do it as a cyclical job.

Next, let's share my thoughts on performance optimization.

/ 01 clarify the purpose of optimization /

Many people gradually optimize for the sake of optimization, lose their purpose, or even don't think about it in the first place. In this way, you will be trapped in a meaningless performance black hole, unable to extricate yourself, just constantly pursuing better-looking performance indicators.

The purpose of optimization can be to enhance the user experience, such as eliminating pages and actions that have obvious stutters. It can also save server bandwidth traffic and reduce server pressure. In any case, you need to have a purpose.

/ 02 set the standard, to what extent /

Optimization is endless, and in order to avoid getting caught up in the meaningless performance black hole mentioned earlier, we'd better be able to define a relatively objective standard based on the actual business situation, which represents the extent of optimization.

My own standard is to make sure it is 50% higher than expected and get 100% if conditions permit.

For example, according to the comparison between the current performance indicators and the volume of business, it is found that the maximum number of concurrency may be more than 2 times that of the current, so it is a reasonable standard to strive for optimization to increase by 3 times and at least 2.5 times.

I have written a special article on capacity estimation, "do" capacity estimation "but without true and false". You can skip over at the end of the article, so we won't expand it here.

/ 03 find the bottleneck /

When many people do optimization, they start to change the code when they catch it. Indeed, as long as there is a certain amount of knowledge accumulation, it is easy to find in the code that writing method An is not as good as writing code like method B.

But in most cases, "process optimization is much better than syntax-level optimization". For example, to change each string concatenation to use StringBuilder to achieve, in most cases the results are actually very small, and even in some cases it is better not to change.

Therefore, we had better be able to use some objective data to get more information about the operating environment to find the shortest "board" on the whole "barrel". For example, the overall architecture of the whole system, the information of the server, and so on, it is easy to locate the bottleneck of the performance.

The "process" in "process optimization is much better than syntax-level optimization" includes not only business processes, but also technical processes, such as the flow of data in the network.

/ 04 start to optimize /

Finally, we should start to optimize.

Two common misunderstandings need to be avoided when doing optimization.

First, do not excessively pursue the stand-alone performance of the application, if the stand-alone performance is good, you should also think from an overall point of view.

Second, don't over-pursue the extreme optimization of a single dimension, such as overpursuing the performance of CPU while ignoring memory bottlenecks.

The correct thinking is generally in line with the following two directions.

First, space for performance. One node can't stand to copy one more node, and the only copy of data leads to fierce competition for resources, so copy one more copy of the data.

Second, distance for performance. If the data is slow to return to the client after layers of processing from the server, can it be stored directly in the client, or at least as close as possible to the client?

All right, the train of thought is clear, I suggest you do it according to the order of the subheadings below. It is the same whether you are actively optimizing performance or passively troubleshooting performance problems.

/ 01 Application level /

Whether you are willing to admit it or not, most of the real-world performance problems are caused by code in part of the application itself.

We are always reluctant to admit our mistakes. I have seen too many programmers habitually attribute the problem to hardware problems, network problems, and so on, and then the root cause is often in coding applications.

Therefore, we should start with the application itself. Moreover, the location of the application is more "upstream" and more operable, so we can have more means to optimize it.

01 caching

First of all, the most common is "caching", which is a classic of trading space for performance.

Data must be stored in a non-volatile database, but some data that will be accessed frequently, copy it from the database and store it in volatile memory for caching, which can greatly improve the performance of being accessed. We all understand this truth, so we won't say much about it.

But it is worth reminding that the data structure design of cached data is very important, and no data structure is omnipotent. More trade-offs are needed, because the simpler and unitary the data structure is designed, the more secondary operations of the cached data will be; conversely, if all the "result data" are stored, the amount of redundant data will be too large (the cache data update is also troublesome).

It should also be reminded that if the amount of cached data is large, you should also consider adding a cache elimination algorithm, otherwise the cache hit rate is ugly and a lot of memory resources are wasted.

There are several cache-related articles in the previous "distributed Systems Series" that talk about a lot of details, which can be skipped at the end of the article.

02 async

To take an example in real life, if you order a cup of milk tea on your mobile phone and go to the store to get it, you find that there are still 20 numbers in front of you, will you wait here for half an hour?

I don't think most people would rather go somewhere else. Async is a way to improve performance by avoiding "waiting".

There are two main ways to do async

Asynchronously through threads. This is mainly used in places where Imax O is involved, such as disk Imax O and network Imax O. Once I CPU O is generated, it actually means that the operation behind it is carried out by another program, so there is no need for it to be free at this time. Let it do something else.

Asynchronous through middleware, such as MQ. This is used in larger scenarios, such as the convergence of upstream and downstream systems in some processes, if some results do not need to be received in real time, then asynchronism through MQ can greatly improve performance. After all, the performance of MQ is closer to that of NoSQL, and its performance is naturally much higher than that of relational databases. What's more, the budget of some business logic is lagging behind, and the current performance will be better.

03 Multithreading & distributed

These two points are the embodiment of the thought of "divide and rule". It's slow for a courier to deliver 1000 parcels, so it's natural to have 10 couriers deliver 100 parcels at the same time.

But do not divide too hard, after all, one more thread is equivalent to one more free-range child, if you release too much, the management cost is very high, but it may be slower. This is the cost of thread switching, and there are similar management costs in distributed systems.

But a little piece of advice for you. Do not introduce distribution until you have no choice but to deal with it through "stand-alone multithreading". Because the Internet is so unreliable, you have to do a lot of extra work for it.

04 delayed operation

This is contrary to the idea of caching, delaying some operations to use as much as possible. The applicable scenario is also contrary to the cache, which is suitable for some low-frequency and time-consuming data.

Delayed loading, plug-in and so on are the embodiment of this idea.

05 batch, merge

If you need to transfer multiple data to the same destination frequently in a short period of time, try to package them together for one-time transmission, especially when it comes to Igamo scenarios.

If the system on hand is still a single point system, the performance-to-price ratio of this move is very high. The performance is improved on the premise of avoiding the complexity of the distributed system.

The bulk operation of the database and the sprite diagram of the front end are the embodiment of this idea.

There are many other optimizations at the application level. For example, use long links instead of frequently opened and closed short links, compression, reuse, and so on. These are relatively simple and easy to understand, so I won't say much.

Once things are done at the application level, let's consider the optimization at the component level.

/ 02 component level /

Components refer to things that are not business, such as middleware, databases, runtime environments (JVM, WebServer), and so on.

Generally speaking, the tuning of the database is divided into the following three parts:

SQL statement.

Indexes.

Connection pool.

Others, such as JVM tuning, mainly tune the configuration related to "GC". The tuning of WebServer is mainly for "connection" related tuning. I won't dwell on these details. There are too many materials to read.

/ 03 system level /

Some tuning work at the system level involves some work of operation and maintenance engineers, and I am not very good at it. But we can use some technical indicators at the system level to observe and judge whether our program is normal or not. For example, CPU, thread, network, disk, memory.

01 CPU

To judge whether CPU is normal, in most cases, it is enough to pay attention to these three metrics: CPU utilization, CPU average load, and CPU context switching. The utilization rate of CPU is basically known to everyone, so let's not say much, let's talk about the latter two.

When looking at the average CPU load, you need to pay special attention to changes in trends. If there is little difference between the three values of 1 minute / 5 minutes / 15 minutes, it means that the system load is very stable, so do not pay attention to it. If these three values gradually decrease, the load is gradually increasing, and you need to investigate the specific reasons.

CPU context switch. The more the number of context switches, the more CPU time is spent on saving and recovering data such as registers, kernel stacks and virtual memory, and the less time is spent on the computing work you expect, and the overall performance of the system will naturally decline. There are two main reasons for this situation.

There are more disk Imax O and network Imax O in the program.

There are too many threads started in the program.

02 thread

On the thread side, in addition to the number of threads, you also need to pay attention to the number of threads in the "suspended" state.

The excessive number of threads in suspended state means that there is fierce competition for locks in the program, and other schemes need to be considered to reduce the granularity and level of locks, or even to avoid using locks.

03 network

Usually at the hardware level, the bandwidth of the private network is much larger than that of the external network, so it is more common for the external network bandwidth to be full, especially for multi-graph and multi-streaming media accessible systems. We can generally think of problems related to the size of traffic, so we won't say much about it.

However, Brother Z reminds you to pay special attention to the use of ports and the connection status on each port. A more common problem is whether the connection is released in time when the connection is used up, resulting in the port being occupied and the subsequent new network requests unable to establish a connection channel. (network-related information can be obtained through netstat and ss. )

04 disk

Unless it is a very large system, in general, there is no problem with the disk metrics.

When looking at it usually, in addition to looking at utilization, throughput and the number of requests, there are two points that are easy to overlook.

The first point is that if Istroke O utilization is high, but throughput is low, it means that there are more random disk reads and writes, and it is best to optimize random reads and writes to sequential reads and writes. (you can use strace or blktrace to observe whether iCompo is continuous to determine whether it is a sequential reading and writing behavior.)

Second, if the length of the waiting queue for Iwhite O is large, then there is a performance problem for the disk. In general, this can be considered if the queue length continues to exceed 2.

05 memory

When focusing on memory, in addition to memory consumption, there is a Swap swap in and out of memory size that needs to be paid special attention. Because Swap needs to read and write disks, the performance is not very high. If the object traversed during GC happens to be outsourced by Swap, a disk IBG O will be generated, and the performance will naturally degrade. Therefore, this index should not be too high.

Most memory problems are related to the non-timely release of resident memory of objects, and there are many tools that can observe the memory allocation of objects. For example, jmap, VisualVM, heap dump and so on.

If your program is deployed on a linux system, you have to miss the essence of Brendan Gregg. The following is to quote a picture to give you a feeling, you can go to http://www.brendangregg.com/linuxperf.html to consult more related content.

▲ pictures from brendangregg.com

Finally, although performance optimization is a well-known good thing, there is a cost to do the best thing. Therefore, if not necessary, do not optimize the performance too early and excessively.

All right, let's sum up.

In this article, Brother Z talked to you about the performance of the program, which is a headache for programmers. The premise to avoid this problem is to do a good job of performance optimization in advance.

Performance optimization cannot be done step by step. There are three things you need to do in advance: "define the purpose of optimization", "set standards", and "find the bottleneck".

When doing specific optimization, it is recommended to start from the application level, and then to the component level, and finally the system level, from top to bottom, layers of depth. By the way, I share some common methods and ideas at each level.

I hope it will enlighten you.

In a large system, data is like water, the whole system is like a funnel, and each layer of the funnel represents each subroutine. The lower the performance loss of the upper subroutine, the more water can flow down, up to the last layer of "database", it can also be understood as storage.

So, hurry up and start the war to defend the database.