How does logstash work? 04/26 Update SLTechnology News&Howtos

How does logstash work?

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Logstash is an open source, server-side data processing pipeline (pipeline) that can receive data from multiple sources, convert them, and eventually send them to specified types of destinations. Logstash implements various functions through the plug-in mechanism, and readers can download plug-ins with various functions or write plug-ins themselves.

The function of Logstash is mainly divided into three parts: receiving data, parsing, filtering and transforming data, and outputting data. The corresponding plug-ins are input plug-in, filter plug-in and output plug-in, in which filter plug-in is optional and the other two are necessary plug-ins. That is, in a complete Logstash configuration file, there must be input plug-ins and output plug-ins.

Commonly used input plug-ins

-file: read a file, this read function is somewhat similar to the tail command under linux, one by one real-time read. -syslog: listens to the syslog messages of port 514 of the system and parses it using RFC3164 format. -redis: Logstash can read data from the redis server, where redis is similar to a message caching component. -kafka:Logstash can also read data from kafka clusters. The architecture of kafka plus Logstash is generally used in business scenarios with a large amount of data, and kafka can be used as data buffering and storage. -filebeat:filebeat is a text log collector with stable performance and very few system resources. Logstash can receive data sent by filebeat.

Common filter plug-ins

The filter plug-in is mainly used for data filtering, parsing and formatting, that is, parsing unstructured data into structured, searchable, standardized data. Common filter plug-ins have the following:-grok:grok is the most important plug-in for logstash, can parse and structure arbitrary data, support regular expressions, and provides many built-in rules and templates to use,-mutate: this plug-in provides a wealth of basic type data processing capabilities, including type conversion, string processing and field processing. -date: this plugin can be used to convert time strings in your log records. -GeoIP: this plug-in can provide regional information according to IP address, including country, province, latitude, longitude and latitude, etc. It is very useful for visual maps and regional statistics.

Commonly used output:

-elasticsearch: send data to elasticsearch-file: send data to file-redis: send data to redis, from which you can see that the redis plug-in can be used in either the input plug-in or the output plug-in. -kafka: sends data to kafka, similar to the redis plug-in, which can also be used in logstash's input and output plug-ins.

If you use the rpm package to install the software, the configuration file for logstash is in the / etc/logstah directory. Among them, jvm.options is the configuration file for setting JVM memory resources, and logstash.yml is the logstash global properties configuration file, which generally does not need to be modified. In addition, there is a pipelines.yml file, which is also read by the process when logstash starts. The contents of this file actually point to the configuration files in the subdirectory conf.d under the current directory, and the files in the conf.d directory end with .conf. It is configured with input plug-in, filter plug-in, output plug-in information.

Let's first take a look at how logstash implements input and output, and we don't add filter plug-ins here.

(for installation using the rpm package, the logstash executable is in the / usr/share/logstash/bin/ directory. )

[root@:172.31.22.29 / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-e "" Sending Logstash logs to / var/log/logstash which is now configured via log4j2.properties`date` this timestamp is OK # enter this information. Then press the enter key {"message" = > "`date` this timestamp is OK", "host" = > "ip-172-31-22-29.ec2.internal", "@ timestamp" = > 2019-01-22T02:59:01.422Z, "type" = > "stdin", "@ version" = > "1"}

One thing you should remember:

/ usr/share/logstash/bin/logstash-e/usr/share/logstash/bin/logstash-e ""

We are here to enter a simple interactive command interface, when I enter information, the logstash service will give me feedback. Let's explain the meaning of the command:

-e stands for execution. Input means input, and input is the way of input. Stdin is selected here, that is, standard input (input from the terminal). Output is the meaning of output, output is the way of output, stdout is selected here, that is, standard output (output to terminal). The codec here is a plug-in that indicates the format. Here, it is put in stdout to indicate the format of the output. Rubydebug is a format specially used for testing, which is generally used to output JSON format at the terminal.

Logstash's output outputs content in JSON format:

Logstash adds some additional information to the event in the output. For example, @ version, host, and @ timestamp are all new fields, and the most important one is @ timestamp, which is used to mark the occurrence time of the event. Because this field involves Logstash internal flow, if you rename a string field to @ timestamp, Logstash will directly report an error. In addition, this field cannot be deleted. There is also a field type that represents the unique type of event. Tags, which represents some aspect of the event attribute.

Our example above is the simplest example of logstash. However, most production environments use the-f parameter to read configuration files. As we mentioned above, the configuration file is usually in the / etc/logstash/conf.d directory and must end with .conf to be read by the logstash service.

This time, let's take the configuration file as an example:

1) first, we go to the conf.d directory, then create the configuration file l1.conf and enter the following:

Input {file {path = > "/ var/log/.txt"}} output {stdout {codec = > rubydebug}}

Save exit. What this means is that we create a configuration file to read the log file / var/log/.txt, and as soon as there is data in this file, we will read it out.

Next, start the logstash service:

[root@: / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l1.confSending Logstash logs to / var/log/logstash which is now configured via log4j2.properties [2019-01-22T03:19:03462] [WARN] [logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified [2019-01-22T03:19:03486] [INFO] [logstash.runner] Starting Logstash {"logstash.version" = > "6.5.4"} [2019-01-22T03:19:08344] [INFO] [logstash.pipeline] Starting pipeline {: pipeline_id= > "main" "pipeline.workers" = > 4, "pipeline.batch.size" = > 125, "pipeline.batch.delay" = > 50} [2019-01-22T03:19:08655] [INFO] [logstash.inputs.file] No sincedb_path set, generating one based on the "path" setting {: sincedb_path= > "/ var/lib/logstash/plugins/inputs/file/.sincedb_0d6c5b209e03529a50b2eca9300b7d96" : path= > ["/ var/log/.txt"]} [2019-01-22T03:19:08706] [INFO] [logstash.pipeline] Pipeline started successfully {: pipeline_id= > "main",: thread= > "#"} [2019-01-22T03:19:08773] [INFO] [logstash.agent] Pipelines running {: count= > 1,: running_pipelines= > [: main],: non_running_pipelines= > [2019-01-22T03:19:08783] [INFO] [filewatch.observingtail] START Creating Discoverer, Watch with file and sincedb collections [2019-01-22T03:19:09220] [INFO] [logstash.agent] Successfully started Logstash API endpoint {: port= > 9600}

There will be a bunch of startup messages that don't interfere with the experiment.

Next, enter a line into / var/log/.txt on the other terminal:

[root@::172.31.22.29 / etc/logstash] # echo "`date` + timestamp is OK" > > / var/log/.txt

Then go back to the original terminal to view the content:

{"message" = > "Tue Jan 22 03:21:32 UTC 2019 + timestamp is OK", "@ version" = > "1", "@ timestamp" = > 2019-01-22T03:21:33.843Z, "path" = > "/ var/log/.txt", "host" = > "ip-172-31-22-29.ec2.internal"}

The logstash service reads the / var/log/.txt file and collects data for display.

Next, let's interpret the configuration file l1.conf:

First of all, let's take a look at the input plug-in, where the input source of input is defined as file, and then the path of the file is specified as / var/log/.txt, that is, the content of this file is used as the input source. The path attribute here is a required configuration, and the subsequent path must be an absolute path, not a relative path. If you need to monitor multiple files, you can separate them by commas. As follows:

Path = > ["PATH1", "PATH2", "PATH3"]

The output plug-in here still uses rubydebug's JSON output format.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.