How to practice the massive Log Analysis platform based on Elastic Stack 04/26 Update SLTechnology News&Howtos

How to practice the massive Log Analysis platform based on Elastic Stack

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to carry out the practice of a massive log analysis platform based on Elastic Stack. The article is rich in content and analyzes and describes it from a professional point of view. I hope you can get something after reading this article.

Introduction to Elastic Stack Elastic Stack includes Beats, Elasticsearch, Logstash, Kibana, APM and so on, and ELK is its core suite. Elasticsearch is a real-time full-text search and analysis engine, which provides three major functions of collecting, analyzing and storing data; it is a set of open REST and JAVA API structures that provide efficient search functions and scalable distributed systems. It is built on top of the Apache Lucene search engine library. Logstash is a tool for collecting, analyzing, and filtering logs. It supports almost any type of log, including Syslog, error log, and custom application log. It can receive logs from many sources, including syslog, messaging (such as RabbitMQ), and JMX, which can output data in a variety of ways, including e-mail, websockets, and Elasticsearch. Kibana is a Web-based graphical interface for searching, analyzing, and visualizing log data stored in Elasticsearch metrics. It uses Elasticsearch's REST interface to retrieve data, not only allowing users to create custom dashboard views of their own data, but also allowing them to query and filter data in a special way.

Beats is a lightweight data acquisition tool, including:

1.Packetbeat (collect network traffic data)

2.Topbeat (collects data such as CPU and memory usage at the system, process, and file system levels)

3.Filebeat (collect file data)

4.Winlogbeat (collect Windows event log data)

5.Metricbeat (collects system-level CPU usage, memory, file system, disk IO, and network IO statistics)

6.Auditbeat (collect linux audit logs)

System architecture the first ELK architecture is the simplest ELK architecture. The advantage is that it is easy to build and easy to use. The disadvantage is that Logstash consumes a lot of resources and takes up a lot of CPU and memory. In addition, there is no message queue cache, so there is a hidden danger of data loss. It is recommended to use it in small clusters. First of all, this architecture is distributed by Logstash on each node to collect relevant logs and data, and after analysis and filtering, it is sent to the Elasticsearch on the remote server for storage.

Elasticsearch compresses and stores the data in the form of fragments and provides a variety of API for users to query and operate. Users can also more intuitively query the log by configuring Kibana Web Portal and generate reports according to the data.

The second architecture introduces the message queue mechanism, in which the Logstash Agent located on each node first passes the data / log to Kafka (or Redis), and indirectly passes the messages or data in the queue to Logstash,Logstash filtering, analysis and then transfers the data to Elasticsearch storage. Finally, the log and data are presented to the user by Kibana. Because of the introduction of Kafka (or Redis), even if the remote Logstash server stops running due to a failure, the data will be stored first to avoid data loss. This architecture is suitable for large cluster solutions, but because the load of Logstash central node and Elasticsearch is relatively heavy, they can be configured as cluster mode to share the load. The advantage of this architecture lies in the introduction of message queuing mechanism to balance network transmission, thus reducing the possibility of network blocking, especially the loss of data, but there is still the problem that Logstash takes up too much system resources.

The third architecture introduces Logstash-forwarder. First of all, Logstash-forwarder collects and sends the log data to the Logstash,Logstash on the master node for analysis, filters the log data and sends it to Elasticsearch storage, and Kibana finally presents the data to the user. This architecture solves the problem that Logstash takes up high system resources at each computer point. The test shows that compared with the system CPU and MEM occupied by Logstash,Logstash-forwarder, it is almost negligible. In addition, the communication between Logstash-forwarder and Logstash is encrypted through SSL, which ensures the security. If it is a large cluster, users can also configure logstash cluster and Elasticsearch cluster as structure 3, and introduce High Available mechanism to improve the security of data transmission and storage. More importantly, the configuration of multiple Elasticsearch services contributes to search and data storage efficiency. However, under this framework, it is found that the communication between Logstash-forwarder and Logstash must be encrypted by SSL, so there are certain restrictions.

The fourth architecture, replacing Logstash-forwarder with Beats. After testing, the system resource consumption of Beats at full load is the same as that of Logstash-forwarder, but its expansibility and flexibility are greatly improved. Beats platform currently includes three products: Packagebeat, Topbeat and Filebeat, all of which are Apache 2.0 License. At the same time, users can carry out secondary development according to their needs. This architecture principle is based on the third architecture, but it is more flexible and scalable. At the same time, Logstash and Elasticsearch clusters can be configured to support operation and maintenance log data monitoring and query of large cluster systems.

An example of system architecture: MySQL log audit system MySQL log audit system, the use of percona audit plug-in audit MySQL access, the results are recorded in the specified file. Centralize each MySQL audit log to a specified directory in Rsyslog Server through Rsyslog, monitor file changes using filebeat, and report to kafka. Using Logstash to consume data, the data is filtered and cut, then written into ES, and users query the relevant data through kibana. The architecture of the system is as follows:

Because the Percona version of MySQL Server is used, the audit uses Percona's audit plug-in. To avoid consuming too much performance, the audit log only records the connection and outputs it to a file. The audit logs collected are collected through the imfile module of Rsyslog and sent to Rsyslog Server for unified storage. Files received on Rsyslog are reported to kafka through filebeat. After that, Logstash is responsible for consuming kafka's data, filtering and cutting it, and writing it to ES. Users can query the data they need in kibana, as shown below:

The above is the practice of how to carry out a massive log analysis platform based on Elastic Stack shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.