Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of Filebeat Optimization practice

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article to share with you is about Filebeat optimization practice example analysis, Xiaobian think quite practical, so share to everyone to learn, I hope you can read this article after some gains, not much to say, follow Xiaobian to see it.

Filebeat Optimization Practice Background Introduction

At present, the mainstream log collection systems include ELK(ES+Logstash+Kibana),EFK(ES+Fluentd+Kibana), etc. Because Logstash appeared earlier, most log file collections use Logstash. However, since Logstash is implemented by JRuby, the performance overhead is large, so our log collection uses Filebeat, and then sent to Logstash for data processing (e.g.: parsing json, regular parsing file name, etc.), and finally sent to Kafka or ES by Logstash. Although this approach reduces the processing pressure on each node, the performance overhead of deploying Logstash nodes is still high, and Filebeat often fails to send data to Logstash.

Abandon Logstash

Because Logstash has a large performance overhead, in order to improve the log collection performance of the client, reduce the data transmission link and deployment complexity, and fully utilize the performance advantages of Go language for log parsing, it is decided to implement the parsing of the company log format specification on Filebeat by developing plug-ins, which is directly used as a substitute for Logstash.

Develop your own Processor

Our platform is Kubernetes-based, so we need to parse the source of each log and get the Kubernetes resource name from the log file name to determine the Topic to which the log is sent. Resolving file names requires regular matching, but due to the high performance overhead of regular parsing, if every log uses regular parsing names, it will bring a relatively high performance overhead, so we decided to use caching to solve this problem. That is, each file only resolves the name once and stores it in a Map variable. If the file name has already been resolved, it will not be resolved. This greatly improves Filebeat throughput.

performance optimization

Filebeat configuration file is as follows, where kubernetes_metadata is the Processor developed by itself.

################### Filebeat Configuration Example ###################################################### Filebeat ######################################filebeat: # List of prospectors to fetch data. prospectors: - paths: - /var/log/containers/* symlinks: true# tail_files: true encoding: plain input_type: log fields: type: k8s-log cluster: cluster1 hostname: k8s-node1 fields_under_root: true scan_frequency: 5s max_bytes: 1048576 # 1M # General filebeat configuration options registry_file: /data/usr/filebeat/kube-filebeat.registry############################# Libbeat Config ################################### Base config file used by all other beats for using libbeat features############################# Processors ######################################processors:- decode_json_fields: fields: ["message"] target: ""- drop_fields: fields: ["message", "beat", "input_type"]- kubernetes_metadata: # Default############################# Output ########################################### Configure what outputs to use when sending the data collected by the beat.# Multiple outputs may be used.output: file: path: "/data/usr/filebeat" filename: filebeat.log

Test environment:

Performance test tool using github.com/urso/ljtest

Flame graph generation using uber go-torch https://github.com/uber/go-torch

CPU restricts the use of one core via runtime.GOMAXPROCS(1)

The performance data for the first edition are as follows:

Average speed 1 million total time 11970/s 83.5 seconds

The CPU flame diagram generated is as follows

It can be seen from the flame diagram that there are two main CPU time consuming blocks. One piece is the Output processing part, writing files. The other one is a bit strange, common.MapStr.Clone() method, which takes up 34.3% of CPU time. Errorf takes up 21% of CPU time. Look at the code:

func toMapStr(v interface{}) (MapStr, error) { switch v. (type) { case MapStr: return v. (MapStr), nil case map[string]interface{}: m := v. (map[string]interface{}) return MapStr(m), nil default: return nil, errors.Errorf("expected map but type is %T", v) }}

errors.Errorf takes up a lot of time to generate error objects. Put this piece of judgment logic into MapStr.Clone() to avoid errors. Should you think about it? Although the error of go is a very good design, it cannot be abused, cannot be abused, cannot be abused! Or you could pay a terrible price for it.

After optimization:

Average speed 1 million total time 18687/s 53.5 seconds

The processing speed was actually increased by more than 50%. I didn't expect that the throughput could be improved so much after optimizing a few lines of code. Was it surprising or unexpected? And look at the modified flame diagram.

MapStr.Clone() was found to have negligible performance cost.

Further optimization:

Our logs are all generated by Docker, using JSON format, while Filebeat uses Go's own encoding/json package, which is based on reflection implementation, and performance has certain problems. Since our log format is fixed and the parsed fields are fixed, we can serialize JSON based on fixed log structure instead of inefficient reflection. Go has multiple third-party packages that do JSON serialization/deserialization for a given struct, using easyjson: https://github.com/mailru/easyjson.

Because the log format of parsing is fixed, so define the structure of the log in advance, and then use easyjson parsing. Processing speed performance increased to

Average speed 1 million total time 20374/s 49 seconds

However, this modification will make decode_json_fields a processor that can only handle specific log formats, and the scope of application will be reduced. So JSON parsing this has not been modified for the time being.

Log processing has always been an important part of system operation and maintenance, whether it is traditional operation and maintenance mode or new cloud platform log collection based on Kubernetes (or Mesos, Swarm, etc.). No matter which way you choose to collect logs, you may encounter performance bottlenecks, but a small piece of code improvement may completely solve your problem. The road is long and the optimization is endless.

A few things to note:

Filebeat is based on version 5.5.1 and Go version 1.8.3

Filebeat uses runtime.GOMAXPROCS(1) to limit the use of only one core in the test

Since the tests were run on the same machine with the same data, exporting the logs to a file has little impact on the test results.

The above is an example analysis of Filebeat optimization practice. Xiaobian believes that some knowledge points may be seen or used in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report