Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to solve the blank of skywalking market caused by the bottleneck of Elasticsearch writing?

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to solve the blank of skywalking market caused by the bottleneck of Elasticsearch writing". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn how to solve the blank of the skywalking market caused by the bottleneck of Elasticsearch writing.

Preface

Less than a month after the last skywalking failure, "parsing Arthas to assist in troubleshooting online skywalking unavailable problems", online skywalking went wrong again. The market is blank again, and the recent data in the trace list cannot be queried, but the data that takes a long time can be queried. For example, there is data one day ago, but there is no data an hour ago. This is just an appearance. Finally, the crux of the problem is the bottleneck of ES service writing, which leads to the blocking of the writing thread. The following is the troubleshooting process and solution description.

Problem positioning

The tool is the same as the tool Arthas. If you don't understand it, you can read my previous blog post. I don't explain Arthas here. This time, however, we have applied a new advanced instruction thread, which can view the current thread information and view the thread's stack. When there is no data in the skywalking market, use the following instructions:

Thread-bTHREAD-B to find out which threads are currently blocking other threads

Sometimes we find that the application is stuck, usually because one thread holds a lock and other threads are waiting for the lock. To troubleshoot such problems, arthas provides thread-b to identify the culprit with one click. Finally, the following results are obtained:

As in the picture above, I believe you have already seen the problem, focusing on the part where the red font arrow points. I have to say that Arthas has done a great job. The crux of the problem is that the bulk write failure thread of ES is blocked. Later, I learned from the community that it was because of the bottleneck of ES writing that the thread was blocked when skywalking was bulk writing to the index. The period of data that caused the blocking was not written to ES, and then there was no problem with the query, and the appearance was that the blank market of skywalking could not query the recent data.

Solution interim solution, SKYWALKING parameter tuning

Skywalking writes to ES using ES's bulk write interface. We can adjust the dimensions of these batches. Minimize the frequency of writing ES indexes, such as:

Elasticsearch: clusterNodes: 192.168.20.221:9200 indexShardsNumber: 2 indexReplicasNumber: 0 # Batch process setting Refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html bulkActions: 4000 # Execute the bulk every 2000 requests bulkSize: 40 # flush the bulk every 20mb flushInterval: 30 # flush the bulk every 10 seconds whatever the number of requests concurrentRequests: 2 # the number of concurrent requests receiver-register: default:receiver-trace: default: bufferPath:.. / trace-buffer/ # Path to trace buffer files Suggest to use absolute path bufferOffsetMaxFileSize: 500 # Unit is MB bufferDataMaxFileSize: 1000 # Unit is MB bufferFileCleanWhenRestart: false

Adjust bulkActions default 2000 requests to 4000 writes at a time. Batch refresh from 20m to 40m. This configuration tuning did take effect, and there was no blocking of ES writes for two or three days after restarting the service. However, this setting is only temporary, you can only hope that the traffic will not burst, or the application will not increase. Once there is an increase in sudden traffic and applications, the bottleneck of ES writing will still be highlighted. Moreover, too large parameter setting brings a new problem, that is, the data writing delay will be relatively large, and the trace of a service interaction can only be queried on the skywalking page for a long time. So the ultimate solution is to optimize the write performance of ES.

Final solution-optimize the write performance of ES

If it is a self-built Elasticsearch service, the basic big data team is responsible for search engine Elasticsearch optimization and development, and the blog shares a lot of tunable configuration parameters. However, considering the manpower and expenditure of operation and maintenance, we decided to adopt the Elasticsearch provided by Aliyun. However, this brings a new problem. Aliyun's ES service requires Http Basic authentication, but the current skywalking does not provide such support.

At this point, I believe that everyone on the "Elasticsearch write bottleneck led to skywalking market blank how to solve" have a deeper understanding, might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report