How to understand the number of threads and CPU utilization 04/21 Update SLTechnology News&Howtos

How to understand the number of threads and CPU utilization

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article focuses on "how to understand the number of threads and CPU utilization", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to understand thread count and CPU utilization.

01 Quick Test of thread count and CPU utilization

Put aside some operating systems, computer principles aside, say a basic theory (do not worry about whether rigorous, just for easy understanding): a CPU core, per unit time can only execute one thread of instructions, then in theory, a thread only need to keep executing instructions, can run full utilization of a core.

Let's write an example of running in an endless loop to verify it:

Test environment: AMD Ryzen 53600, 6-Core,12-Threads.

Public class CPUUtilizationTest {public static void main (String [] args) {/ / dead loop, do nothing while (true) {}

After running this example, take a look at the current CPU utilization:

As you can see from the picture, my No. 3 core utilization has been full. Based on the above theory, how many more threads should I try?

Public class CPUUtilizationTest {public static void main (String [] args) {for (int j = 0; j)

< 6; j++) { new Thread(new Runnable() { @Override public void run() { while (true){ } } }).start(); } } } 此时再看 CPU 利用率，1/2/5/7/9/11 几个核心的利用率已经被跑满：

If you open 12 threads, will it run full utilization of all the cores? The answer must be yes:

What happens if I continue to increase the number of threads in the above example to 24?

As you can see from the above figure, the CPU utilization is the same as the previous step, with 100% of all cores, but the load has increased from 11.x to 22.x, indicating that the CPU is busier and the thread's tasks cannot be executed in a timely manner.

Load average explanation reference:

Https://scoutapm.com/blog/understanding-load-averages

Modern CPU is basically multi-core, for example, I tested here AMD 3600 CPU 6 core 12 threads (hyperthreading), we can simply think of it as 12 core CPU. Then my CPU can do 12 things at the same time without disturbing each other.

If the number of threads to be executed is greater than the number of cores, then it needs to be scheduled by the operating system. The operating system allocates CPU time slice resources to each thread, and then keeps switching, thus achieving the effect of "parallel" execution.

But is it really faster? As can be seen from the above example, a thread can run full utilization of a core.

If each thread is domineering, keeps executing instructions, does not give CPU idle time, and the number of threads executing at the same time is greater than the number of cores of CPU, it will cause the operating system to switch thread execution more frequently to ensure that each thread can be executed.

However, switching comes at a cost, each switching will be accompanied by register data update, memory page table update and other operations.

Although the cost of a switchover is negligible compared with the IAccord O operation, if there are too many threads, thread switching is too frequent, and even the switching time per unit time has been greater than the program execution time, it will lead to excessive waste of CPU resources on context switching instead of executing the program, and the loss outweighs the gain.

The example of running empty in an endless loop above is a bit too extreme and is not likely to have such a program under normal circumstances.

When most programs are running, there will be some Icano operations, such as reading and writing files, sending and receiving messages on the network, etc., which need to wait for feedback when they are going on.

For example, when the network reads and writes, it needs to wait for the message to be sent or received. In this waiting process, the thread is in a waiting state and the CPU is not working.

At this point, the operating system will schedule CPU to execute instructions from other threads, thus making full use of the idle period of CPU and improving the utilization of CPU.

In the above example, the program loops and does nothing, and CPU keeps executing instructions, leaving little free time.

What about the utilization of CPU if you insert an iMab O operation and the CPU is idle during the Imax O operation?

First, let's take a look at the results under a single thread:

Public class CPUUtilizationTest {public static void main (String [] args) throws InterruptedException {for (int n = 0; n < 1) Override public void run +) {new Thread (new Runnable () {@ Override public void run () {while (true) {/ / after 100 million empty cycles, sleep 50ms, simulate the for waiting and switching (int I = 0; I < 100000000l) Catch +) {} try {Thread.sleep (50);} catch (InterruptedException e) {e.printStackTrace () }). Start ();}

Wow, the only No. 9 core with utilization is only 50%, which is half lower than the 100% without sleep.

Now adjust the number of threads to 12:

The utilization rate of a single core is about 60, which is not much different from the result of single thread just now. The utilization of CPU has not been fully run. Now the number of threads is increased to 18:

At this time, the utilization rate of single core is close to 100%. Thus it can be seen that when there are operations such as Icano in the thread that do not take up CPU resources, the operating system can schedule CPU to execute more threads at the same time.

Now increase the frequency of Icano events and reduce the number of loops to half, 50 million million, which is also 18 threads:

At this point, the utilization rate of each core is only about 70%.

02 small summary of thread count and CPU utilization

The above example is only auxiliary in order to better understand the relationship between the number of threads / program behavior / CPU state.

Let's briefly sum up:

An extreme thread (when constantly performing "computational" operations) can run full utilization of a single core, and multi-core CPU can only execute at most "extreme" threads equal to the number of cores at the same time.

If each thread is so "extreme" and the number of threads executing at the same time exceeds the number of cores, it will lead to unnecessary switching, resulting in overload and slower execution.

When pausing class operations, such as CPU O, the CPU is idle, and the operating system schedules CPU to execute other threads, which can improve the utilization of CPU and execute more threads at the same time.

The higher the frequency of CPU O events, or the longer the wait / pause time, the longer the idle time of CPU, and the lower the utilization, and the operating system can schedule more threads for CPU execution.

03 the formula of thread number planning

The foreshadowing in front of me is all to help understand. Now let's take a look at the definition in the book.

Java concurrent programming practice introduces a formula for calculating the number of threads:

If you want the program to run to the target utilization of CPU, the formula for the number of threads required is:

The formula is very clear, now let's try the above example.

If I expect the target utilization to be 90% (multicore 90), then the number of threads required is: the number of cores 12 * utilization 0.9 * (1-50 (sleep time) / 50 (cycle 50,000,000-time)) ≈ 22.

Now set the number of threads to 22 and see the results:

Now the CPU utilization is about 80 percent, which is close to the expectation. Due to the excessive number of threads, some context switching overhead, and the lack of rigor of the test cases, it is normal for the actual utilization to be lower.

By changing the formula, you can also calculate CPU utilization by the number of threads:

Number of threads 22 / (number of cores 12 * (1: 50 (sleep time) / 50 (cycle 50, 000, 000) ≈ 0.9.

Although the formula is good, in real programs, it is generally difficult to obtain accurate waiting time and calculation time, because the program is very complex, not just "calculation".

There will be a lot of compound operations such as memory read and write, calculation, Icano and so on in a piece of code, so it is difficult to obtain these two indicators accurately, so it is too ideal to rely on the formula to calculate the number of threads.

04 number of threads in the real program

So how much is the number of threads (thread pool size) planned in the actual program, or in some Java business systems?

First of all, the conclusion: there is no fixed answer, first set expectations, such as my expected CPU utilization, load, GC frequency and other indicators, and then constantly adjust to a reasonable number of threads through the test.

For example, an ordinary, SpringBoot-based business system, the default Tomcat container + HikariCP connection pool + G1 recycler, if the project also needs a business scenario multithreading (or thread pool) to execute business processes asynchronously / in parallel.

At this time, if I plan the number of threads according to the above formula, the error will be very large.

Because at this time, there are already many running threads on this host, Tomcat has its own thread pool, HikariCP also has its own background threads, JVM also has some compiled threads, even G1 has its own background threads.

These threads are also running on the current process, the current host, and will also consume CPU resources.

Therefore, under the interference of the environment, it is difficult to accurately plan the number of threads by relying on the formula alone, and it must be verified by testing.

The process generally goes like this:

Analyze whether there are any other process interference on the current host.

Analyze whether there are any other running or possible threads on the current JVM process.

Set goals, target CPU utilization: how much can I tolerate my CPU soaring? Target GC frequency / pause time: after multithreaded execution, the GC frequency will increase, what is the maximum frequency to tolerate, and what is the pause time per time? Execution efficiency: for example, in batch processing, how many threads must be opened in my unit time to finish processing in time.

Sort out the key points of the link to see if there are any choking points, because if there are too many threads, the limited resources of some nodes on the link may cause a large number of threads to wait for resources (for example, three-party interfaces are limited, the number of connection pools is limited, middleware pressure is too high to support, etc.).

Constantly increase / decrease the number of threads to test, according to the highest requirements to test, and finally get a "meet the requirements" number of threads.

And! The concept of thread count is also different in different scenarios:

The maxThreads in Tomcat is not the same under Blocking I-Tomcat O and No-Blocking I-Bank O.

Dubbo defaults to a single connection, and there is also a distinction between the Imax O thread (pool) and the business thread (pool). Generally speaking, the Imax O thread is not a bottleneck, so it is not necessary to have too many, but the business thread can easily be called a bottleneck.

Redis is also multithreaded after 6. 0, but it is only multithreaded by IBG O, and the "business" processing is still single-threaded.

So, don't worry about setting up how many threads. There is no standard answer, be sure to combine the scenario, with the goal, through testing to find the most appropriate number of threads.

Some students may have questions: "there is no pressure on our system, we don't need such an appropriate number of threads, it's just a simple asynchronous scenario that doesn't affect other functions of the system."

Very normal, many internal business systems, do not need any performance, stable and easy to use to meet the requirements. Then my recommended number of threads is the number of CPU cores.

05 Appendix

Java gets the number of CPU cores:

Runtime.getRuntime () .availableProcessors () / / gets the number of logical cores, such as 6 cores and 12 threads, then 12 is returned

Linux gets the number of CPU cores:

# Total number of cores = number of physical CPU X number of cores per physical CPU # total number of logical CPU = number of physical CPU X number of cores per physical CPU X number of hyperthreads # View the number of physical CPU cat / proc/cpuinfo | grep "physical id" | sort | uniq | wc-l # View the number of core in each physical CPU (that is, the number of cores) cat / proc/cpuinfo | grep "cpu cores" | uniq # View the number of logical CPU cat / proc / cpuinfo | grep "processor" | wc-l so far I believe you have a deeper understanding of "how to understand the number of threads and CPU utilization". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.