Ten thousand words: deployment and Architecture Analysis of ELK (V7) 07/13 Update SLTechnology News&Howtos

Ten thousand words: deployment and Architecture Analysis of ELK (V7)

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Author: Qin Wei

Background introduction and Application scene of 1.ELK

In the process of project application running, a large number of logs are often generated, and we often need to locate and analyze the operation of our server project and the location of BUG generation according to the logs. In general, you can get the information you want directly in tailf, grep, and awk in the log file. However, in large-scale scenarios, this method is inefficient, including excessive log volume, slow text search, and how to query multi-dimensional queries. This requires a summary of the logs on the server. The common solution is to establish a centralized log collection system to collect, manage and access the logs on all nodes.

In general, large-scale systems are often a distributed deployment architecture, and different service modules are deployed on different servers. When problems occur, most of the problems need to be located to specific servers and service modules according to the key information exposed by the problem. therefore, the construction of a centralized log system can improve the efficiency of locating problems. A complete centralized logging system needs to include the following main features:

Collection-ability to collect log data from multiple sources, service logs and system logs.

Transmission-stable transmission of log data to the central system

Storage-how to store log data and persist data.

Analysis-can support UI analysis, interface customization to view log operations.

ELK provides a complete set of solutions, and they are all open source software, which cooperate with each other, connect perfectly and efficiently meet the applications of many occasions. It is a mainstream log system at present.

Introduction to 2.ELK:

ELK is the abbreviation of three open source software, which means: Elasticsearch, Logstash, Kibana, they are all open source software. A new FileBeat was later added.

That is, ELK is mainly composed of three components: Elasticsearch (search), Logstash (Collection and Analysis) and Kibana (display).

The components are described as follows:

Filebeat: lightweight data collection engine. In the early ELK architecture, Logstash was used to collect and parse logs, but Logstash consumes a lot of resources such as memory, cpu, io and so on. If you use it to collect logs from the server, it will increase the load on the server. Compared with the CPU and memory occupied by Logstash,Beats, the CPU and memory of the system are almost negligible, so filebeat, as a lightweight log collection and processing tool (Agent), can be used to replace Logstash. Because it takes up less resources, it is more suitable to collect logs on each server and transfer them to Logstash, which is also an official recommendation. [collect logs]

Logstash: data collection and processing engine. Support dynamic collection of data from a variety of data sources, and filter, analyze, enrich, unify the format and other operations, and then store for subsequent use. [filter and analyze the log]

Elasticsearch: distributed search engine. It is an open source distributed search server based on Lucene, with the characteristics of high scalability, high reliability, easy management and so on. It can be used for full-text retrieval, structured retrieval and analysis, and can combine the three. [collect, analyze, store data]

Kibana: visualization platform. It can search and display indexed data stored in Elasticsearch. It can be easily used to display and analyze data with charts, tables and maps. [graphical display log]

Combined with the above, the common log system architecture is as follows:

As shown in the figure above, the log files are collected on the server by filebeat. The collected log files are collected on logstash and filtered, analyzed, enriched, uniformly formatted and other operations are performed on the file data, then sent to Elasticsearch for further structured retrieval and analysis of the logs, stored and finally displayed by kibana.

This is only the most basic structure of the logging system, which needs to be further optimized in the production environment.

Deployment of the 3.ELK system (configuration will be described in the production environment architecture)

Environment: CentOS7.5 deploys ELK version 7

Preparatory work: CentOS7.5

Elasticsearch7, logstash7, kibana7

Turn off the firewall and SELinux and update the yum source (optional): yum-y update

The ELK version of this distributed deployment is the latest release around May 2019, with a number of new features updated, which will be described in more detail later. In the installation of the above greatly simplified the installation steps, can be source code, component installation, this time using the YUM+ plug-in docker installation, can achieve the same effect.

3.1Filebeat installation and configuration

#! / bin/bash

Install Filebeat

Rpm-- import https://packages.elastic.co/GPG-KEY-elasticsearch

Cat > / etc/yum.repos.d/elk-elasticsearch.repo192.168.108.191:56656 (ESTABLISHED)

Java 5519 root 169u IPv6 528516 0t0 TCP192-168108-200 IPv6 XmlIpcRegSvc-> 192.168.108.187 IPv6 49368 (ESTABLISHED)

Java 5519 root 176u IPv6 528096 0t0 TCP192-168108-200 0t0 TCP192 48062-> 1928108-200:XmlIpcRegSvc (ESTABLISHED)

Java 5519 root 177u IPv6 528518 0t0 TCP192-168108-200 IPv6 XmlIpcRegSvc-> 192168-108200 IPv6 48062 (ESTABLISHED)

Java 5519 root 178u IPv6 528097 0t0 TCP192-168108-200 IPv6 XmlIpcRegSvc-> 192.168.108.187 IPv6 49370 (ESTABLISHED)

Background startup: bin/kafka-server-start.sh-daemon config/server.properties

Service startup script

Cd / opt/kafka

Kill-9 ps-ef | grep kafka | awk'{print $2}'

Bin/zookeeper-server-start.sh-daemon config/zookeeper.properties

Bin/kafka-server-start.sh-daemonconfig/server.properties

Zookeeper+Kafka cluster testing

/ opt/kafka/bin

1. Create a topic:

Kafka-topics.sh-create--zookeeper

192.168.108.200:2181192.168.108.165:2181192.168.108.103:2181--replication-factor 3-partitions 3-topic test

two。 Show topic

Kafka-topics.sh-describe--zookeeper 192.168.108.200:2181192.168.108.165:2181192.168.108.103:2181--topic test

[root@192-168108-200bin] # kafka-topics.sh--describe-zookeeper192.168.108.200:2181192.168.108.165:2181192.168.108.103:2181-topic tyun

Topic:tyun PartitionCount:3 ReplicationFactor:3Configs:

Topic: tyun Partition: 0 Leader: 1 Replicas: 1,0,2Isr: 1,0,2

Topic: tyun Partition: 1 Leader: 2 Replicas: 2,1,0Isr: 2,1,0

Topic: tyun Partition: 2 Leader: 0 Replicas: 0,2,1Isr: 0,2,1

Number of PartitionCount:partition

ReplicationFactor: number of copies

Partition:partition number, incrementing from 0

Leader: the current broker.id where partition works

Replicas: the broker.id in which the current copy of the data is located, which is a list, and its role comes first.

Isr: list of broker.id available in the current kakfa cluster

3. List topic

Kafka-topics.sh-list-zookeeper192.168.108.200:2181192.168.108.165:2181192.168.108.103:2181

Test

Create producer (producer)

Kafka-console-producer.sh-broker-list192.168.108.200:9092-topic test

Hello

Create a consumer (consumer)

Kafka-console-consumer.sh-bootstrap-server192.168.108.200:9092-topic test-from-beginning

Hello

4. Check the messages written to the kafka cluster (important commands, important basis for determining whether logs are written to Kafka)

Bin/kafka-console-consumer.sh--bootstrap-server 192.168.108.200 9092-topic tiops-from-beginning

5. Delete Topic---- command flag after deletion, delete the corresponding data directory again

Bin/kafka-topics.sh--delete-zookeeper master:2181,slave1:2181,slave2:2181-topic topic_name

If delete.topic.enable=true

Delete the Topic directly and completely.

If delete.topic.enable=false

If the current Topic has not been used, no information has been transferred: it can be deleted completely.

If the current Topic has been used, the message has been transferred: the Topic is not really deleted, just mark the Topic as marked for deletion, restart Kafka Server and delete it.

Note: the delete.topic.enable=true configuration information is located in the configuration file config/server.properties (there is no explicit configuration in newer versions, the default is true).

Analysis of log system architecture

The whole system contains a total of 10 hosts (filebeat is deployed on the client, not counting), of which there are four Logstash, two Elasticsearch, three Kafka clusters, one kibana and configured with Nginx agents.

Architectural explanation:

(1) first, users access the ELK log statistics platform through the nginx agent, where the Nginx can set the interface password.

(2) Nginx forwards the request to kibana

(3) kibana goes to Elasticsearch to obtain data. The Elasticsearch here is a cluster made by two servers, and the log data is randomly stored on any Elasticsearch server.

(4) Logstash2 takes the data from Kafka and sends it to Elasticsearch.

(5) Kafka server persists log data to avoid log loss caused by inconsistent data collection and storage when the log volume of web server is too large, in which Kafka can be used as a cluster, and then the Logstash server continuously retrieves data from Kafka.

(6) the log information extracted by logstash3 from Filebeat and saved in Kafka.

(7) Filebeat collects logs on the client side.

Note 1: [reason and function of joining Kafka]

The whole architecture is added to Kafka to make the whole system better layered. Kafka, as a message flow processing and persistent storage software, can help us shield the differences of different log files between multiple slave nodes on the master node. The person responsible for managing the log end (slave node) can focus on producing data into the Kafka, while the person in charge of data analysis and aggregation can focus on consuming data from the Kafka. So add Kafka to your deployment.

And the reason for using Kafka for log transmission is that it has the ability of data cache, and its data can be consumed repeatedly, Kafka itself has high availability, can well prevent data loss, its throughput is relatively good and widely used. Can effectively prevent the loss of logs and prevent logsthash from hanging. To sum up: it balances network transmission, thus reducing the possibility of network blocking, especially the loss of data.

Note 2: [double-layer Logstash effect]

Why add two logstash in front of Kafka here? Because it is easy to lead to data loss and confusion when a large number of log data are written, in order to solve this problem, two logstash can be added to summarize and classify by type to reduce the bloated data transmission.

If there is only one layer of Logstash, it will process and analyze the log information collected from different client Filebeat. To a certain extent, it will cause confusion in the processing of information under large-scale log data, and seriously deepen the load, so there is a two-tier structure for load balancing, and the division of responsibilities, one layer converges simple shunt, one layer analyzes and filters the processing information. And there are two Logstash in the inner layer to ensure the high availability of the service, so as to improve the stability of the entire architecture.

Next, the principle and the interaction between the various components (configuration files) are described respectively.

1.Filebeat and Logstash-collect connection configuration

To make it easier to remember, call the Logstash here Logstash-collect. First, look at the configuration file of Filebeat:

Among them, the filebeat configuration outputs to the logstash:5044 port, port 5044 is started on the logstash as the communication agent between logstash and filebeat, and the logstash service itself starts at port 9600.

[root@192-168108-191 logstash] # ss-lnt

State Recv-Q Send-Q Local Address:Port Peer Address:Port

LISTEN 0 128: 5044:

LISTEN 0 50:: ffff:192.168.108.191:9600:

Vim / etc/filebeat/filebeat.yml # sets several important configurations as follows

Type: log

Change to true to enable this inputconfiguration.

Enabled: true

Paths that should be crawled and fetched.Glob based paths.

Paths:

/ var/log/*.log/var/log/tiops/*/.log # is all the log locations of the tiops platform. Specify that the input path of the data is all files at the end of / tiops/*/.log. Note that the logs under the / tiops/ subdirectory will not be read, while the logs under the Sun Tzu directory can

#-c:\ programdata\ elasticsearch\ logs*

Fields:

Service: filebeat

Multiline: # multiple lines of logs are merged into one line, which is suitable for situations where each log occupies multiple lines, such as error message call stacks in various languages.

Pattern:'^ ['

Negate: true

Match: afterExcludelines. A list of regular expressions to match. It drops the lines that are deletes lines that start with DBG:

Exclude_lines: ['^ DBG']

Filebeat logs can be sent to logstash, elasticsearch and kibana, and you can choose one. Logstash is selected by default this time, and the other two can be annotated.

Output.logstash:

The Logstash hosts

Hosts: ["192.168.108.191 hosts 5044", "192.168.108.87 hosts 5044"] # to the second Logstash-collect

Loadbalance: true

Worker: 2

# output.elasticsearch:

# hosts: ["http://192.168.108.35:9200"]

# username: "elk"

# password: ""

# setup.kibana:

# host: "192.168.108.182 purl 5601"

Note 1: [log message flow between Filebeat and Logstash-collect]

Note: how does Logstash-collect receive the message sent by Filebeat and send it to the next level again?

Not only configure the ip address and port to be sent to the two Logstash-collect, but also pay attention to the fields field, which can be understood as a Key. When the Logstash-collect receives multiple Key, it can select one or more Key to be sent at the next level to achieve the forwarding of log messages.

After using version 6.2.3 of ELK, if you use if [type] configuration, you can't match the strings defined with document_type in filebeat. Because the definition of document_type has been cancelled since version 6. 0. To achieve the above configuration, you can only use the following configuration:

Fields:

Service: filebeat

Service: filebeat is defined by itself. After the definition is completed, the if of Logstash is used to determine the condition: if [fields] [service] = = "filebeat". It is fine. For more information, please see the forwarding strategy below.

Note 2: [output of Filebeat load balancing host]

There is another point to note here, that is, the load balancing settings for Filebeat and Logstash-collect connections.

Logstash is a stateless stream processing software. Logstash cluster configuration can only be scaled out, and then distributed with configuration management tools, because there is no internal communication.

Filebeat provides configuration options that can be used to adjust load balancing when sending messages to multiple hosts. To enable load balancer, you specify that the value of loadbalance is true.

Output.logstash:

The Logstash hosts hosts: ["192.168.108.191 The Logstash hosts hosts 5044", "192.168.108.87 The Logstash hosts hosts 5044"] # to the second Logstash-collect loadbalance:true worker: 2

Loadbalance: the false # message is just sent to one logstash. If the logstash fails, it will automatically send the data to another logstash. (active / standby mode)

Loadbalance: true # if it is true, the data will be evenly distributed to each logstash. If it is hung up, it will not be sent, and will be sent to the surviving logstash.

That is, if the logstash address is a list, if loadbalance is enabled, the load will be sent to the server in the inner table. When a logstash server is unreachable, the event will be distributed to the reachable logstash server (dual active mode).

The loadbalance option is available for output by Redis,Logstash and Elasticsearch. The output of Kafka can handle load balancing within itself.

At the same time, each host load balancer also supports multiple workers. The default is 1. If you increase the number of workers, additional network connections will be used. Total number of workers participating in load balancing = number of hosts * workers.

In this section, configure the overall structure and the flow of log data as follows:

2.Logstash-collect and Kafka connection configuration

First look at the / etc/logstash/logstash.yml file. As the public configuration file for Logstash-collect, we need to make the following changes:

Vim/etc/logstash/logstash.yml

Path.data: / var/lib/logstash # data storage path

Path.config: / etc/logstash/conf.d # read path to the configuration file

Path.logs: / var/log/logstash # the path where log files are saved

Pipeline.workers: 2 # defaults to the core number of CPU

Http.host: "192.168.108.186"

Http.port: 9600-9700 # logstash will pick up thefirst available ports

Notice the read path / etc/logstash/conf.d of the configuration file above. In this folder, we can set the input and output of Logstash. Logstash supports writing the configuration to the file / etc/logstash/conf.d/xxx.conf and then collecting data by reading the configuration file.

The basic flow of logstash log collection is: input- > codec- > filter- > codec- > output, so the configuration file can set the basic format of input, output and filtering as follows: (filter is ignored here, because the Logstash at this layer mainly aggregates data and does not analyze matching for the time being)

About codec = > json # json processing

Logstash will eventually encapsulate the data into json type, and the @ timestamp time field, host host field and type field will be added by default. The original message data is encapsulated in the message field. If multiple fields are added to the user resolution during data processing, the end result will be more than one field. Multiple fields can also be removed during data processing. In short, the final output of logstash is in json format. So additional fields are added when the data passes through Logstash, and you can choose to filter it.

Logstash is mainly composed of three components of input,filter,output to collect data.

Input {

Specify input

}

Filter {

}

Output {

Specify output

The explanation is as follows:

Input

Input component is responsible for reading data, you can use file plug-in to read local text files, stdin plug-in to read standard input data, tcp plug-in to read network data, log4j plug-in to read data sent by log4j and so on. This time, use the beats plug-in to read the log messages sent by Filebeat.

Filter

The filter plug-in is responsible for filtering and parsing the data read by input. You can use the grok plug-in to regularly parse the data, the date plug-in to parse the date, the json plug-in to parse the json, and so on.

Output

The output plug-in is responsible for outputting the data processed by filter. You can use elasticsearch plug-in output to es,redis plug-in output to redis,stdout plug-in standard output, kafka plug-in output to kafka, and so on

In the actual Tiops platform log processing process, the configuration is as follows:

Both Logstash-collect adopt this same configuration

Under the / etc/logstash/conf.d directory, create the filebeat-kafka.conf file as follows:

Input {

Beats {

Port = > 5044 # logstash starts port 5044 as the communication agent between logstash and filebeat

Codec = > json # json processing

}

# Redis is appended here as the configuration for caching messages. Kafka is used this time, so this section is temporarily annotated.

# output {

If [fields] [service] = = "filebeat" {redis {data_type = > "list" host = > "192.168.108.200" db = > "0" port = > "6379" key = > "tiops-tcmp"}}

Output {

If [fields] [service] = = "filebeat" {# refer to the first part, how does Logstash-collect receive the message sent by Filebeat and send it to the next level again? Here, take the key value of the Fields field of filebeat and send it downwards.

Kafka {

Bootstrap_servers = > "192.168.108.200 IP 9092192.168.108.165VR 9092192.168.108.103" # kafka cluster configuration, all IP: ports must be added

Topic_id = > "tiops" # set the topic of Kafka so that when it is sent to kafka, kafka will automatically create the topic

}

In this section, configure the overall structure and the flow of log data as follows:

3.Kafka and Logstash-grok connection configuration

In the second part above, we have successfully sent the data from Logstash-collect to Kafka, while the input and output of Kafka itself is not configured. Instead, Logstash-collect specifies a topic, which does not mean that kafka does not need configuration parameters, but kafka itself does not need to configure pull/push target parameters, it passively accepts

Data sent by other producers because Logstash-collect specifies the IP address and port of the Kafka cluster. Then the data will be sent to Kafka, so how does Kafka handle the log data, then you need to configure the message processing parameters for Kafka.

We will give parameters for the time being. As for the specific reasons, such as the high availability of MQ, message expiration policy, how to ensure that messages are not consumed repeatedly, how to ensure that messages are not lost, how to ensure that messages are executed sequentially, and what to do when messages are overstocked in the message queue, we will analyze and solve them in detail later.

The following is the configuration file of one of the Kafka clusters. The / kafka/config/server.properties file in the Kafka source directory is set as follows: (bold is the important configuration file)

Broker.id=0

Listeners=PLAINTEXT://192.168.108.200:9092

Advertised.listeners=PLAINTEXT://192.168.108.200:9092

Num.network.threads=3

Num.io.threads=8

Socket.send.buffer.bytes=102400

Socket.receive.buffer.bytes=102400

Socket.request.max.bytes=104857600

Log.dirs=/var/log/kafka-logs

Num.partitions=1

Num.recovery.threads.per.data.dir=1

Offsets.topic.replication.factor=1

Transaction.state.log.replication.factor=1

Transaction.state.log.min.isr=1

Log.retention.hours=168

Log.segment.bytes=1073741824

Log.retention.check.interval.ms=300000

Zookeeper.connect=192.168.108.200:2181192.168.108.165:2181192.168.108.103:2181 # zookeeper Cluster

Delete.topic.enable=true

At the same time, kafka is a distributed message queue that depends on the zookeeper registry, so a zookeeper stand-alone or cluster environment is required. This time, it uses the Zookeeper environment that comes with Kafka.

The / kafka/config/zookeeper.properties file in the Kafka source directory is set as follows:

DataDir=/opt/zookeeper # needs to create a new myid file under the / opt/zookeeper folder and enter broker.id. This node is 0.

The port at which the clients will connect

ClientPort=2181

Disable the per-ip limit on the number of connections since this isa non-production config

MaxClientCnxns=100

TickTime=2000

InitLimit=10

SyncLimit=5

Server.0=192.168.108.200:2888:3888

Server.1=192.168.108.165:2888:3888

Server.2=192.168.108.103:2888:3888

For other considerations, please see the previous Kafka cluster installation steps, which mainly give the configuration file parameters of a node in the cluster.

So far, we have configured the parameters of kafka to process the message (persistence...), and then we need to transfer the message in Kafka to Logstash-grok.

It should be noted here that Kafka does not take the initiative to send messages, and of course Kafka also supports this operation, but here, Logstash-grok is used as a consumer to actively pull messages from Kafka for consumption.

So you need to set the input and output like the second step Logstash-collect, where Logstash-grok not only configures input and output, the most important thing is its filtering filter function.

In the / etc/logstash/conf.d directory, create the logstash-es.conf file. Here, one of the Logstash-grok is given as follows (the two Logstash-grok configurations are the same):

# input {

Redis {data_type = > "list" host = > "192.168.108.200" db = > "0" port = > "6379" key = > "tiops-tcmp" password = > "123456"}

Input {

Kafka {

Bootstrap_servers = > ["192.168.108.200 9092192.168.108.165VR 9092192.168.108.103"] # configure the pulled kafka cluster message address

Group_id = > "tyun" # set group_id. For this consumption group, there will be a consumption offset. Consumption is partially controlled by auto_commit. This parameter will periodically submit your consumption offset to kafka, and those already consumed will not be consumed again the next time you start.

If you want to consume again, you can change to another group_id and auto_offset_reset = > "earliest" consumer_threads = > "3" # the sum of the consumer_threads number of more than one instance should be equal to the number of topic partitions, which is similar to the effect of logstash multi-instance. Decorate_events = > "false" topics = > ["tiops"] # kafka topic name, get the message within the specified topic type = > "log" codec = > json}

}

Filter {# the most important filtering function of Logstash here, you can customize regular expressions and selectively output messages.

Grok {

Match = > {

# intercept

"message" = > "(?

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.