How to realize pressure Test and Analysis of RocketMQ performance 04/27 Update SLTechnology News&Howtos

How to realize pressure Test and Analysis of RocketMQ performance

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to achieve RocketMQ performance stress test analysis. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

One machine deployment 1.1 machine composition

1 nameserver

1 broker asynchronous flashing disk

2 producer

2 consumer

1.2 hardware configuration

CPU two x86_64cpu, each cpu12 core, a total of 24 cores

48 GB of memory

Network card gigabit network card

Disk except that the disk of the broker machine is RAID10, a total of 1.1T, the rest are ordinary disks of about 500G

1.3 deployment structure

The orange arrow is the data flow, and the black connection is the network connection.

1.4 Kernel parameters

Broker is a storage system, which has its own flushing strategy for disk reading and writing, uses a lot of file memory mapping, and consumes a lot of file handles and memory. Therefore, the default setting of the system can not make RocketMQ play a good performance, so it is necessary to set some specific parameters for the system's pagecache, memory allocation, Imax O scheduling and file handle restrictions.

System Icano and virtual memory settings

Echo 'vm.overcommit_memory=1' > > / etc/sysctl.conf

Echo 'vm.min_free_kbytes=5000000' > > / etc/sysctl.conf

Echo 'vm.drop_caches=1' > > / etc/sysctl.conf

Echo 'vm.zone_reclaim_mode=0' > > / etc/sysctl.conf

Echo 'vm.max_map_count=655360' > > / etc/sysctl.conf

Echo 'vm.dirty_background_ratio=50' > > / etc/sysctl.conf

Echo 'vm.dirty_ratio=50' > > / etc/sysctl.conf

Echo 'vm.page-cluster=3' > > / etc/sysctl.conf

Echo 'vm.dirty_writeback_centisecs=360000' > > / etc/sysctl.conf

Echo 'vm.swappiness=10' > > / etc/sysctl.conf

System file handle settings

Echo 'ulimit-n 1000000' > > / etc/profile

Echo 'admin hard nofile 1000000' > > / etc/security/limits.conf

System Ipaw O scheduling algorithm

Deadline

1.5 JVM parameters

Use RocketMQ default settings

-server-Xms4g-Xmx4g-Xmn2g-XX:PermSize=128m-XX:MaxPermSize=320m-XX:+UseConcMarkSweepGC-XX:+UseCMSCompactAtFullCollection-XX:CMSInitiatingOccupancyFraction=70-XX:+CMSParallelRemarkEnabled-XX:SoftRefLRUPolicyMSPerMB=0-XX:+CMSClassUnloadingEnabled-XX:SurvivorRatio=8-XX:+DisableExplicitGC-verbose:gc-Xloggc:/root/rocketmq_gc.log-XX:+PrintGCDetails-XX:-OmitStackTraceInFastThrow

Performance Evaluation 2.1 Evaluation purpose

Pressure test stand-alone TPS to evaluate stand-alone capacity

2.2 Evaluation indicators

The highest TPS does not represent the most suitable TPS, and a tradeoff must be made between the TPS and the various indicators of the system resources. The system resources will soon reach the limit, but can still operate normally, so the TPS at this time is more appropriate. For example, ioutil should not exceed 75% CPU load should not exceed the total number of cores or too many cores. Frequent swap would not cause large memory jolts. Therefore, you can not only focus on TPS, but also focus on the following indicators:

Message: TPS

Cpu:load,sy,us

Memory: useed,free,swap,cache,buffer

I _ I _.

Network: network card traffic

2.3 Evaluation method

Two producer threads continuously send 2K messages to broker, which means 1000 characters. This message is relatively large and can fully meet the needs of the business.

2.4 Evaluation results

TPS is relatively high

After long-time testing and observation, a single borker TPS is as high as 16000, that is, the server can process 16000 messages per second, and the consumer consumes in time. The average delay from the server storing the message to the consumer consuming the message is about 1.3s, and the delay will not increase with the increase of the TPS, so it is a relatively stable value.

Broker has high stability.

The two producer start a total of 44 threads to send messages non-stop for 10 hours. The broker is very stable, which simply means that in the actual production environment, dozens of producer can send messages to a single broker with high frequency, but the broker will remain stable. Under such pressure, the load of broker is only as high as 3 (24-core cpu), and there is a lot of memory available.

Moreover, during the test for more than 10 hours in a row, the jvm of broker was very smooth, without a single fullgc, the recovery efficiency of the new generation of GC was very high, and there was no pressure on memory. Here are the data extracted from gclog:

2014-07-17T22:43:07.407+0800: 79696.377: [GC2014-07-17T22:43:07.407+0800: 79696.377: [ParNew: 1696113K-> 18686K (1887488K), 0.1508800 secs] 2120430K-> 443004K (3984640K), 0.1513730 secs] [Times: user=1.36 sys=0.00, real=0.16 secs]

The size of the Cenozoic is 2g, the memory occupation is about 1.7g before recovery and 17m after recovery, and the recovery efficiency is very high.

About disk IO and memory

The average time spent on a single physical IO is about 0.06ms, and IO has almost no extra wait because await and svctm are basically the same. During the whole testing process, there was no disk physical read, because of the file mapping, a large amount of cached memory cached the file contents, and there was still a lot of free space in the memory.

Performance bottleneck of the system

After the TPS reaches 16000, it will not be able to go up again, at this time, the traffic of the gigabit network card is about 100m per second, which basically reaches the limit, so the network card is the performance bottleneck. However, the highest IOUTIL of the system has reached about 40%, and this number is not low, so even if the network traffic increases, the system IO index may be unhealthy. Overall, the TPS of 16000 on a single machine is a relatively safe value.

The following is the trend of each indicator

TPS

TPS can overwhelm about 16000 at the highest, and then press up, TPS has a downward trend.

Memory

The memory is very stable, with a total of 48 GB, and the actual available memory is very high.

No swap exchange occurs, and system performance will not be bumpy due to frequent access to the disk.

A large amount of memory is used as file cache, as shown in the cached index, which greatly avoids physical disk reading.

Disk throughput

As the number of threads increases, the disk physical IO reads and writes about 70m of data per second.

Percentage of IO

As the number of threads increases, the percentage of IO finally stabilizes at around 40%, which is acceptable.

This is the end of this article on "how to achieve RocketMQ performance stress Test Analysis". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it out for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.