II. Principle and use of logstash 07/06 Update SLTechnology News&Howtos

II. Principle and use of logstash

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. Overview 1.1 introduction to logstash

logstash is a data analysis software whose main purpose is to analyze log logs. The whole set of software can be regarded as a MVC model, logstash is the controller layer, Elasticsearch is a model layer, and kibana is the view layer. First, the data is transmitted to logstash, which filters and formats the data (converted to JSON format), and then sends it to Elasticsearch for storage and search index. Kibana provides front-end pages for search and chart visualization. It is the data returned by calling Elasticsearch's interface for visualization. Logstash and Elasticsearch are written in Java, and kibana uses the node.js framework.

1.2 logstash architecture

figure 1.1 logstash architecture

When logstash works, it mainly sets the working properties of three parts.

Input: setting up data sources

Filter: data can be processed and filtered, but complex processing logic is not recommended. This step is not necessary.

Output: setting output destination

II. Logstash deployment

You can download the desired version directly to https://www.elastic.co/downloads/logstash, which is version 6.6.2. Deployment is actually very simple. Now you can use it by decompressing it directly. Similar to flume, the key lies in the compilation of collection configuration files.

It is common to start logstash in the following way

Debugging method: start the foreground process directly in bin/logstash-f / path/to/configfile production environment: nohup bin/logstash-f / path/to/configfile > standard log 2 > error log-before startup, you can use the-t option to test whether the configuration file has syntax errors For example: bin/logstash-f / path/to/configfile-t III, write configuration file 3.1 input configuration 3.1.1 read file example: output the contents of monitoring file to consoleinput {file {path = > ["/ var/log/*.log", "/ var/log/message"] type = > "system" start_position = > "beginning"} output {stdout {codec= > rubydebug}} there are some useful configuration items Can be used to specify the behavior of the FileWatch library: how often does discover_intervallogstash check to see if there are new files under the listening path. The default value is 15 seconds. Files that exclude does not want to be monitored can be excluded. Glob deployment is supported here as well as path. Close_older A file that is already listening. If the content is not updated within a period of time beyond this value, close the file handle that listens to it. The default is 3600 seconds, or one hour. Every time ignore_older checks the file list, if the last modification time of a file exceeds this value, it ignores the file. The default is 86400 seconds, or one day. Sincedb_path if you don't want to use the default $HOME/.sincedb (on the Windows platform at C:\ Windows\ System32\ config\ systemprofile\ .uploedb), you can define the sincedb file to another location through this configuration. How often does sincedb_write_intervallogstash write sincedb files? the default is 15 seconds. How often does stat_intervallogstash check the status of monitored files (for updates). The default is 1 second. Where start_positionlogstash starts reading file data, the default is the end location, which means that the logstash process runs in a form similar to tail-F. If you are importing the original data, change this setting to "beginning" and the logstash process will read from scratch and run in the form of less + F. 3.1.2 Standard input

The stdin module is used for standard input, which simply reads data from standard input. Example:

Input {stdin {add_field = > {"key" = > "value"} codec= > "plain" tags = > ["add"] type = > "std"}} output {stdout {codec= > rubydebug}} enter hello You can see the following information printed: hello {"message" = > "hello", "tags" = > [[0] "[add]"], "@ version" = > "1", "host" = > "bigdata121", "@ timestamp" = > 2019-09-07T03:20:35.569Z, "type" = > "std" "key" = > "value"} type and tags are two special fields in the logstash event. Generally speaking, we mark the event type with type in the input section. Tags is added or deleted by specific plug-ins in the process of data processing. 3.2 codec configuration

by default, logstash only supports input in plain text, and then processes the data into the specified format in the filter filter. But now, we can process different types of data during the input period, all because of the codec setting. Therefore, a previous concept needs to be corrected here. Logstash is not just an input | filter | output data stream, but an input | decode | filter | encode | output data stream! Codec is used for decode and encode events.

Example, enter data in json format

Input {stdin {add_field = > {"key" = > "value"} codec = > "json" type = > "std"}} output {stdout {codec = > rubydebug}} when entering json data Automatically parse the input: {"name": "king"} output: {"name" = > "king", "host" = > "bigdata121", "@ timestamp" = > 2019-09-07T04:05:42.550Z, "@ version" = > "1", "type" = > "std", "key" = > "value"} 3.3.1 grok plug-ins logstash has rich filter plug-ins They extend the raw data that enters the filter. Perform complex logic processing, and even add new logstash events to subsequent processes out of nothing! Grok is one of the most important plug-ins for Logstash. It is also by far the best way to make poor, unstructured logs structured and searchable. Grok is perfect at parsing files in any format, such as syslog logs, apache and other webserver logs, mysql logs, etc. This tool is very suitable for Syslog, Apache and other web server logs, MySQL logs and so on. Configuration: input {stdin {type = > "std"}} filter {grok {match= > {"message" > "% {IP:client}% {WORD:method}% {URIPATHPARAM:request}% {NUMBER:bytes}% {NUMBER:duration}"}} output {stdout {codec= > rubydebug}} input: 55.3.244.1 GET / index.html 15824 0.043 output: {"@ version" = > "1" "host" = > "zzc-203", "request" = > "/ index.html", "bytes" = > "15824", "duration" = > "0.043", "method" = > "GET", "@ timestamp" = > 2019-03-19T05:09:55.777Z, "message" = > "55.3.244.1 GET / index.html 15824 0.043", "type" = > "std" The syntax for the "client" > "55.3.244.1"} grok pattern is as follows:% {SYNTAX:SEMANTIC} SYNTAX: the type that represents the matching value, for example, 3.44 can be matched with the NUMBER type, and 127.0.0.1 can be matched with the IP type. SEMANTIC: represents the name of a variable that stores this value. For example, 3.44 may be the duration of an event, and 127.0.0.1 may be the requested client address. So the two values can be matched with% {NUMBER:duration}% {IP:client}. You can also choose to add data type conversions to the Grok schema. By default, all semantics are saved as strings. If you want to convert a semantic data type, such as changing a string to an integer, change its suffix to the target data type. For example,% {NUMBER:num:int} converts num semantics from a string to an integer. The only transformations currently supported are int and float. Logstash comes with about 120 modes. You can find them here https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns custom types most of the time logstash grok can not provide you with the matching type, this time we can use custom. Create a custom patterns file. ① creates a file called patterns where you create a file named postfix (the file name doesn't matter, whatever you want), in which you write the desired pattern as the pattern name, spaces, and then the regular expression of the pattern. For example: POSTFIX_QUEUEID [0-9A-F] {10 9A-F 11} ② and then use the patterns_dir setting in this plug-in to tell logstash that the directory is your custom mode. Configuration: input {stdin {type = > "std"}} filter {grok {patterns_dir = > [. / patterns "] match = > {" message ">"% {SYSLOGBASE}% {POSTFIX_QUEUEID:queue_id}:% {GREEDYDATA:syslog_message} "}} output {stdout {codec= > rubydebug}} input: Jan 1 06:25:43 mailserver14 postfix/cleanup [21403]: BEF25A72965: message-id=" BEF25A72965 " "message" = > "Jan 1 06:25:43 mailserver14 postfix/cleanup [21403]: BEF25A72965: message-id=

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.