In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Problem background
In the project, the Kafka interface is encapsulated with RESTful. When using the RESTful interface for performance testing, it is found that when the number of Topic increases, SSL and non-SSL are enabled for testing. It is found that the performance degrades greatly after SSL is enabled. For example, in a scenario with a total of 600 Topic and 3 copies of each Topic3 partition, only 10 Topic are sent using 1200 threads, and only 3100 TPS is enabled for SSL, but the performance of not enabling SSL is up to 11000.
Among them, the test client will start multiple threads, and each thread will call RESTful API to send synchronously, that is, one will be sent successfully before sending the next one. The client will divide the number of Topic evenly according to the sending thread. For example, if 1200 threads send 10 Topic, there will be 120 threads sending each Topic at the same time.
Positioning and analysis process
Performance degradation of 1.SSL
1. Positioning analysis
Enabling SSL will lead to performance degradation, mainly due to the time-consuming of CPU and the specific implementation of JVM. See the explanation on the official website of Kafka:
From the results of our previous tests, the performance degradation of SSL in highly reliable scenarios is not significant (from 2.3W TPS to 2.1W TPS). It must have triggered some other problem. Looking at the stack under startup SSL through JStack, it is found that some sending threads are held by Block:
What is done in this stack comes from java.security.SecureRandom to generate random numbers using the "SHA1PRNG" algorithm. In sun/oracle 's jdk, the implementation of this random algorithm depends on the random data provided by the operating system at the bottom, the default is / dev/random, and the / dev/random device returns random bytes less than the total number of entropy pool noise when reading. / dev/random can generate a highly random public key or one-time codebook. If the entropy pool is empty, reads to / dev/random will be blocked until enough ambient noise is collected. This problem is also found on the Internet, mainly because there is a global lock in the SecureRandom function provided by JDK. It is easy to encounter this problem when the entropy source is insufficient and there are many SSL threads. For more information, please see:
Https://github.com/netty/netty/issues/3639
Http://bugs.java.com/view_bug.do?bug_id=6521844
two。 Solving measures
Measure 1: update JDK
At present, this problem has been solved in OpenJDK 1.8. it is possible to upgrade JDK to use OpenJDK, but this solution is not easy to replace, and it is not clear what incompatibility between OpenJDK and the original.
Measure 2: use a non-blocking entropy source: / dev/urandom
By setting the-Djava.security.egd=file:/dev/./urandom macro and selecting / dev/urandom for random numbers, it repeatedly uses the data in the entropy pool to generate pseudorandom data to avoid blocking, but random security is reduced.
Performance degradation of 2.Topic in many cases
1. Positioning analysis
It is found that in the case of Topic600, the delay gap between non-SSL and SSL is not as large as previously found. The following is the delay data we tested with SDK API:
For a total of 600 Topic, 400 threads send 10 Topic at the same time, and the delay of non-SSL and SSL is compared:
It can be seen that the delay gap is less than 20%, and the main delay increase is caused by the increase of Topic.
Why does more Topic lead to more latency? To solve this problem, a hit test is carried out in the program. The following is the comparison of non-SSL delay under the scenario of sending a total of 5000 messages for 10 Topic with different number of SSL:
The total delay is the waiting delay of the queue to be sent + the average processing delay of the server + the network transmission and response delay.
As can be seen from the table above, the delay has been increased by 4 to 5 times in almost every processing link. Why did this happen? Analyze the following possible points:
1. The write speed of the disk becomes slower.
2. Server is slow to filter information due to the need for Topic.
3. Replication processing slows down under multi-Topic. Even if there is no data, the replication thread in multiple Topic will always send an empty request.
4. Topic takes up a lot of resources.
Through one by one analysis, exclusion and testing, the main reason is the third point: the server slows down the replication processing in the case of a large number of Topic.
For example, when there are 10 Topic, if 10 replication threads are used for replica replication (the current performance test is configuration 10), each replication thread will be assigned 1 Topic;, and when the Topic has 600, if there are still 10 replication threads for replica replication, each replication thread will be allocated 60 Topic. If only the first 10 Topic are sent at this time, it is very likely that only one replication thread is working, and the other replication threads are basically idle because the assigned Topic has no data.
two。 Solving measures
Since replication threads are slow, we can improve performance by adding more replication threads. When only 10 Topic threads are sent in 600 Topic scenarios, we increase the number of replication threads to 60, so that 10 Topic can be allocated to different replication threads as much as possible, thus improving the speed of replication. The following are the actual test results:
You can see that when the number of fetch threads is increased to 60, the latency becomes about 100ms. At the same time, in the original environment, by adding replication threads (modifying the configuration num.replica.fetchers=60), even if 1200 sending threads start SSL, the performance can reach 11000 +.
Summary of performance improvement measures
RESTful API is a synchronous interface, but the SDK interface used internally is sent asynchronously. According to the fact that the ability of asynchronous transmission can reach 2W + TPS in high reliability scenarios, it is mainly caused by the concurrency pressure of synchronous interfaces, which can be improved by the following measures:
1. Increase the request waiting time linger.ms
By adding the parameter linger.ms in the client, each request can be sent after waiting for a specified time, so that each request can send more data, that is, to increase the package rate.
2. Increase the number of concurrency of the same Topic sent synchronously.
3. Reduce the number of partitions in Topic
Because RESTful API has not exhausted the capacity of the server (the capacity bottleneck of 1 partition has not been reached), the default 3 partitions is a waste of resources and will lead to a reduction in the turnaround rate. If 1 partition is adopted, the turnaround rate can be increased by 3 times under the same pressure, so the performance can also be improved. This measure can also support more Topic numbers.
4. Add replication threads
5. Consider providing an asynchronous sending SDK interface
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
1.SVN client software TortoiseSVN2. Case Server: / data/svn/proClient:svn://192.168.1.115/pro
© 2024 shulou.com SLNews company. All rights reserved.