Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduction to logstash filter plug-in filter

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/03 Report--

1. Logstash filter plug-in filter

1.1.The grok regular capture

Grok is a very powerful logstash filter plug-in that can make unstructured log data structured and easy to query through regular parsing of arbitrary text. It is the best way to parse unstructured log data in logstash.

The syntax rules for grok are:

% {Syntax: semantics}

"Grammar" refers to matching patterns. For example, NUMBER mode can be used to match numbers, and IP mode will match IP addresses such as 127.0.0.1.

For example:

Our experimental data are as follows:

172.16.213.132 [07/Feb/2018:16:24:19 + 0800] "GET / HTTP/1.1" 403 5039

1) Let's take an example to explain filtering IP

Input {stdin {}} filter {grok {match = > {"message" = > "% {IPV4:ip}"}} output {stdout {}

Now turn it on:

[root@:172.31.22.29 / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l2.confSending Logstash logs to / var/log/logstash which is now configured via log4j2.properties172.16.213.132 [07/Feb/2018:16:24:19 + 0800] "GET / HTTP/1.1" 403 5039 # manually enter this line of information {"message" = > "172. 16.213.132 [07/Feb/2018:16:24:19 + 0800]\ "GET / HTTP/1.1\" 403 5039 " Ip = > "172.16.213.132", "@ version" = > "1", "host" = > "ip-172-31-22-29.ec2.internal", "@ timestamp" = > 2019-01-22T09:48:15.354Z}

2) take an example to explain the filtering timestamp

The input and output field information is omitted here.

Filter {grok {match = > {"message" = > "% {IPV4:ip}\\ [% {HTTPDATE:timestamp}\]"}

Next, let's filter it:

[root@:172.31.22.29 / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l2.confSending Logstash logs to / var/log/logstash which is now configured via log4j2.properties172.16.213.132 [07/Feb/2018:16:24:19 + 0800] "GET / HTTP/1.1" 403 5039 manually enter this line information {"@ version" = > "1" "timestamp" = > "07/Feb/2018:16:24:19 + 0800", "@ timestamp" = > 2019-01-22T10:16:14.205Z, "message" = > "172.16.213.132 [07/Feb/2018:16:24:19 + 0800]\" GET / HTTP/1.1\ "403 5039", "ip" = > "172.16.213.132", "host" = > "ip-172-31-22-29.ec2.internal"}

You can see that we have successfully filtered, and grok is actually filtered using regular expressions in the configuration file. Let's do a little experiment. For example, I now add two "-" after the data ip in the example. As shown in the figure:

172.16.213.132-[07/Feb/2018:16:24:19 + 0800] "GET / HTTP/1.1" 403 5039

So at this point in the configuration file, I need to write like this:

Filter {grok {match = > {"message" = > "% {IPV4:ip}\ -\\ [% {HTTPDATE:timestamp}\]"}

Then I have to match two "-" in the match line at this time, otherwise grok will not match the data correctly and therefore cannot parse the data.

Start it up to see the results:

[root@:172.31.22.29 / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l2.confSending Logstash logs to / var/log/logstash which is now configured via log4j2.properties172.16.213.132-- [07/Feb/2018:16:24:19 + 0800] "GET / HTTP/1.1" 403 5039 # manually enter this line, and then press the enter key. {"@ timestamp" = > 2019-01-22T10:25:46.687Z, "ip" = > "172.16.213.132", "message" = > "172.16.213.132-[07/Feb/2018:16:24:19 + 0800]\" GET / HTTP/1.1\ "403 5039", "timestamp" = > "07/Feb/2018:16:24:19 + 0800", "@ version" = > "1" "host" > "ip-172-31-22-29.ec2.internal"}

At this point, we get the information. Here we match IP and time. Of course, you can also match the time directly:

Filter {grok {match = > {"message" = > "\ -\ -\ [% {HTTPDATE:timestamp}\]"}

At this point, we can better understand that grok uses regular matching data.

It is important to note that escape characters are added to match spaces and square brackets in the rule.

3) filter the header information

First, let's write the matching regular pattern.

Filter {grok {match = > {"message" = > "\% {QS:referrer}\"}}

Start it up and see the results:

[root@:172.31.22.29 / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l2.confSending Logstash logs to / var/log/logstash which is now configured via log4j2.properties172.16.213.132-- [07/Feb/2018:16:24:19 + 0800] "GET / HTTP/1.1" 403 5039 {"@ timestamp" = > 2019-01-22T10:47:37.127Z "message" = > "172.16.213.132-[07/Feb/2018:16:24:19 + 0800]\" GET / HTTP/1.1\ "403 5039", "@ version" = > "1", "host" = > "ip-172-31-22-29.ec2.internal", "referrer" = > "\" GET / HTTP/1.1\ ""}

4) for example, let's try to output the time information in the / var/log/message field.

Data for examples:

Jan 20 11:33:03 ip-172-31-22-29 systemd: Removed slice User Slice of root.

Our goal is to output time, that is, the first three columns.

At this time, we can find the matching regularities. We need to look for the file grok-patterns in the / usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-patterns-core-4.1.2/patterns directory under this path. We found this:

It fits very well with the information output above.

First, write the configuration file.

Filter {grok {match = > {"message" = > "% {SYSLOGTIMESTAMP:time}"} remove_field = > ["message"]}}

Start it up and see what's going on:

[root@:172.31.22.29 / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l4.confSending Logstash logs to / var/log/logstash which is now configured via log4j2.propertiesJan 20 11:33:03 ip-172-31-22-29 systemd: Removed slice User Slice of root. # enter this line information manually. {"@ timestamp" = > 2019-01-22T11:54:26.646Z, "host" = > "ip-172-31-22-29.ec2.internal", "@ version" = > "1", "time" = > "Jan 20 11:33:03"}

It is a very useful tool to see that the result has been successfully converted.

1.2.The date plug-in

In the above example, we explain the timestamp field, which indicates the time when the log was taken out. However, in addition to displaying the timestamp you specified, there is also a line of @ timestamp information. The two times are different. @ timestamp represents the current time of the system. The two times are not the same thing. In ELK's log processing system, the @ timestamp field is used by elasticsearch to mark the production time of the log, so the log generation time will be confused. To solve this problem, you need to use another plug-in, that is, the date plug-in, which is used to convert the time string in the log record into a Logstash::Timestamp object, and then save it to the @ timestamp field.

Next, let's configure it in the configuration file:

Filter {grok {match = > {"message" = > "\ -\ [% {HTTPDATE:timestamp}\]"}} date {match = > ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]}

Note: the time zone offset needs to be converted with a letter Z. And the "dd/MMM/yyyy" here, you find that there are three uppercase M in the middle, yes, there are indeed three uppercase M, I tried to write only two M, the conversion failed.

Let's start it and see how it works:

[root@:172.31.22.29 / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l2.confSending Logstash logs to / var/log/logstash which is now configured via log4j2.properties172.16.213.132-- [07/Feb/2018:16:24:19 + 0800] "GET / HTTP/1.1" 403 5039 # manually enter this line of information {"host" "ip-172-31-22-29.ec2.internal" "timestamp" = > "07/Feb/2018:16:24:19 + 0800", "@ timestamp" = > 2018-02-07T08:24:19.000Z, "message" = > "172.16.213.132-- [07/Feb/2018:16:24:19 + 0800]\" GET / HTTP/1.1\ "403 5039", "@ version" = > "1"}

You will find that the @ timestamp time conversion is successful, because I wrote this blog on January 22, 2019. Another point is that the time is less than 8 hours, have you found it? Keep looking down.

1.2.The usage of remove_field

The usage of remove_field is also very common, its function is to remove duplicates, as you can see in the previous example, no matter what kind of information we want to output, there are two pieces of data, that is, there is one in message, and there is also one in HTTPDATE or IP. The purpose of filtering is to filter out useful information, do not repeat, so let's see how to remove duplicates.

1) Let's take the output IP as an example:

Filter {grok {match = > {"message" = > "% {IP:ip_address}"} remove_field = > ["message"]}}

Start the service to check:

[root@:172.31.22.29 / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l5.confSending Logstash logs to / var/log/logstash which is now configured via log4j2.properties172.16.213.132-- [07/Feb/2018:16:24:19 + 0800] "GET / HTTP/1.1" 403 5039 # manually enter this line and press the enter key { "ip_address" = > "172.16.213.132" "host" = > "ip-172-31-22-29.ec2.internal", "@ version" = > "1", "@ timestamp" = > 2019-01-22T12:16:58.918Z}

At this point you will find that there is no line of information for the message shown before. Because we use remove_field to remove him, the benefit is obvious, we only need specific information in the log.

2) in the above examples, we demonstrated the information of the message line one by one, and now I want to show it all in a logstash.

Let's first configure it in the configuration file:

Filter {grok {match = > {"message" = > "% {IP:ip_address}\ -\\ [% {HTTPDATE:timestamp}\]\% {QS:referrer}\% {NUMBER:status}\% {NUMBER:bytes}"}} date {match = > ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]}}

Turn it on and see what happens:

[root@. / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l5.confSending Logstash logs to / var/log/logstash which now configured via log4j2.properties

{

"status" = > "403"

"bytes" = > "5039"

"message" = > "172.16.213.132-[07/Feb/2018:16:24:19 + 0800]\" GET / HTTP/1.1\ "403 5039"

"ip_address" = > "172.16.213.132"

"timestamp" = > "07/Feb/2018:16:24:19 + 0800"

"@ timestamp" = > 2018-02-07T08:24:19.000Z

"referrer" = > "\" GET / HTTP/1.1\ ""

"@ version" = > "1"

"host" > "ip-172-31-22-29.ec2.internal"

}

In this example, you can feel the bloated output, which is equivalent to two copies of the output, so it is necessary to remove this line of the original content message.

3) use remove_field to remove the information from the message line.

First, let's modify the configuration file:

Filter {grok {match = > {"message" = > "% {IP:ip_address}\ -\\ [% {HTTPDATE:timestamp}\]\% {QS:referrer}\% {NUMBER:status}\% {NUMBER:bytes}"} date {match = > ["timestamp" "dd/MMM/yyyy:HH:mm:ss Z"]} mutate {remove_field = > ["message", "timestamp"]}

Start it up and take a look:

[root@:. / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l5.confSending Logstash logs to / var/log/logstash which now configured via log4j2.properties

See? this is the end result we want.

1.3.Time processing (date)

The use of date has been mentioned in several examples above. The date plug-in is particularly important for sorting events and backfilling old data. It can be used to transform the time field in the log record into a Logstash::timestamp object, and then dump it into the @ timestamp field.

Why use this plug-in?

1. On the one hand, Logstash automatically stamps each collected log with a timestamp (@ timestamp), but this timestamp records the time when the input received the data, not the time when the log was generated (because the log generation time is definitely different from the time received by input), which may lead to confusion when searching for data.

2. On the other hand, in the output in the above rubydebug encoding format, although the @ timestamp field has obtained the time of the timestamp field, it is still 8 hours later than Beijing time. This is because within Elasticsearch, UTC time is used for all time type fields, while UTC time is used for logs, which is a consensus in the international security and operation and maintenance circles. It doesn't really matter, because ELK has come up with a solution, that is, on the Kibana platform, the program automatically reads the browser's current time zone, and then automatically converts the UTC time to the current time zone on the web page.

If you want to parse your time, you should use characters instead, and the syntax used to parse date and time text uses letters to indicate the type of time (year, month, day, hour, minute, etc.). And repeated letters to indicate the form of the value. In the "dd/MMM/yyy:HH:mm:ss Z" seen above, he uses this form, and we list the meaning of the characters:

So on what basis do we write the form of "dd/MMM/yyy:HH:mm:ss Z"?

This point is difficult to understand, so try to make it clear to everyone. For example, the above experimental data are

172.16.213.132-[07/Feb/2018:16:24:19 + 0800] "GET / HTTP/1.1" 403 5039

Now we want to change the time, we have to write "dd/MMM/yyy:HH:mm:ss Z", you find that there are three M in the middle, you can't if you write two, because we look up the table and find two capital M for two-digit month, but in the text we want to parse, the month is in abbreviated English, so we can only find three M. And finally why add a capital letter Z, because the text to be parsed contains a "+ 0800" time zone offset, so we have to add it, otherwise filter will not be able to parse the text data correctly and the timestamp conversion will fail.

1.4. data modification mutate plug-in

The mutate plug-in is another very important plug-in for logstash, which provides rich basic type data processing capabilities, including renaming, deleting, replacing, and modifying fields in log events. Here we give several commonly used mutate plug-ins: field type conversion function covert, regular expression replacement field function gsub, separator separator string to numeric function split, rename field function rename, delete field function remove_field

1) Field type conversion convert

Modify the configuration file first:

Filter {grok {match = > {"message" = > "% {IPV4:ip}"} remove_field = > ["message"]} mutate {convert = > ["ip", "string"]}

Or it can be written this way, but there is little difference in writing:

Filter {grok {match = > {"message" = > "% {IPV4:ip}"} remove_field = > ["message"]} mutate {convert = > {"ip" = > "string"}

Now let's start the service to see the effect:

[root@:172.31.22.29 / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l6.confSending Logstash logs to / var/log/logstash which is now configured via log4j2.properties172.16.213.132-- [07/Feb/2018:16:24:9 + 0800] "GET / HTTP/1.1" 403 5039 {"@ timestamp" = > 2019-01-23T04:13:55.261Z "ip" = > "172.16.213.132", "host" = > "ip-172-31-22-29.ec2.internal", "@ version" = > "1"}

In the ip line here, the effect may not be obvious, but it has been converted to string mode.

2) regular expression replaces matching fields

Gsub can replace a matched value in a field with a regular expression, but this itself is valid only for string fields.

First, take a look at the modified configuration file.

Filter {grok {match = > {"message" = > "% {QS:referrer}"} remove_field = > ["message"]} mutate {gsub = > ["referrer", "/", "-"]}

Start it and see how it works:

172.16.213.132-[07/Feb/2018:16:24:9 + 0800] "GET / HTTP/1.1" 403 5039 {"host" = > "ip-172-31-22-29.ec2.internal", "@ timestamp" = > 2019-01-23T05:51:30.786Z, "@ version" = > "1", "referrer" = > "\" GET-HTTP-1.1\ ""}

Very good, it is true that the separator of the part of QS has been replaced with a horizontal bar.

3) the delimiter separates the string into an array

Split can separate the strings in a field as an array by a specified delimiter.

First configure the file

Filter {mutate {split = > ["message", "-"] add_field = > ["An is lower case:", "% {[message] [0]}"]}}

What this means is to separate a field into an array by "-"

Start it up:

A-b-c-d-e-f-g # manually enter the contents of this line and press enter. {"An is lower case:" = > "a", "message" = > [[0] "a", [1] "b", [2] "c", [3] "d", [4] "e", [5] "f", [6] "g"] "host" = > "ip-172-31-22-29.ec2.internal", "@ version" = > "1", "@ timestamp" = > 2019-01-23T06:07:18.062Z}

4) rename field

Rename allows you to rename a field.

Filter {grok {match = > {"message" = > "% {IPV4:ip}"} remove_field = > ["message"]} mutate {convert = > {"ip" = > "string"} rename = > { "ip" = > "IP"}}

The rename field is enclosed in curly braces {}. In fact, we can also use square braces to achieve the same purpose.

Mutate {convert = > {"ip" = > "string"} rename = > ["ip", "IP"]}

Check after startup:

172.16.213.132-- [07/Feb/2018:16:24:9 + 0800] "GET / HTTP/1.1" 403 5039 # manually enter this content {"@ version" = > "1", "@ timestamp" = > 2019-01-23T06:20:21.423Z, "host" = > "ip-172-31-22-29.ec2.internal", "IP" = > "172.16.213.132"}

5) Delete fields, not to mention, we already have an example above.

6) add the field add_field.

Add fields are mostly used in split delimited fields, mainly to specify format output in the fields separated by split.

Filter {mutate {split = > [add_field = > {

After you add a field, the field is displayed in the same format as @ timestamp.

1.5. geoip address query and classification

Geoip is a common free IP address classification query library, geoip can provide the corresponding regional information according to the IP address, including country, province and city, longitude and latitude, etc., this plug-in is very useful for visual maps and regional statistics.

First of all, let's modify the configuration file to see

Filter {grok {match = > {"message" = > "% {IP:ip}"} remove_field = > ["message"]} geoip {source = > "ip"}}

The middle match can also be replaced with the following example:

Grok {match = > ["message", "% {IP:ip}"] remove_field = > ["message"]}

Start it and see how it works:

[root@:172.31.22.29 / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l7.confSending Logstash logs to / var/log/logstash which is now configured via log4j2.properties114.55.68.111-[07/Feb/2018:16:24:9 + 0800] "GET / HTTP/1.1" 403 5039 # manually enter this line of information { "ip" = > "114.55.68.111" "geoip" = > {"city_name" = > "Hangzhou", "region_code" = > "33", "location" = > {"lat" = > 30.2936, "lon" = > 120.1614}, "longitude" = > 120.1614, "latitude" = > 30.2936 "country_code2" = > "CN", "timezone" = > "Asia/Shanghai", "ip" = > "114.55.68.111", "country_code3" = > "CN", "continent_code" = > "AS", "country_name" = > "China", "region_name" = > "Zhejiang"} "host" = > "ip-172-31-22-29.ec2.internal", "@ version" = > "1", "@ timestamp" = > 2019-01-23T06:47:51.200Z}

Succeed.

But not all of the above is what we want, so we can output it selectively.

Continue to modify the content as follows:

Filter {grok {match = > ["message", "% {IP:ip}"] remove_field = > ["message"]} geoip {source = > ["ip"] target = > ["geoip"] fields = > ["city_name", "region_name", "country_name", "ip"]}

Start it up and take a look:

114.55.68.111-- [07/Feb/2018:16:24:9 + 0800] "GET / HTTP/1.1" 403 5039 # manually enter this line information {"@ timestamp" = > 2019-01-23T06:57:29.955Z, "ip" = > "114.55.68.111", "geoip" = > {"city_name" = > "Hangzhou" "ip" = > "114.55.68.111", "country_name" = > "China", "region_name" = > "Zhejiang"}, "@ version" = > "1", "host" = > "ip-172-31-22-29.ec2.internal"}

It is found that the content of the output is really less, and we can output whatever we want.

1.6. comprehensive application of filter plug-in.

Our business example is as follows:

112.195.209.90-[20/Feb/2018:12:12:14 + 0800] "GET / HTTP/1.1" 200 190 "" Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36 ""-"

Double quotation marks, single quotation marks and square brackets in the log that cannot be regularly parsed should be escaped. For more information, please see here: https://www.cnblogs.com/ysk123/p/9858387.html

Now let's modify the configuration file to match

Filter {grok {match = > ["message" "% {IPORHOST:client_ip}\ -\\ [% {HTTPDATE:timestamp}\]\% {QS:referrer}\% {NUMBER:status}\% {NUMBER:bytes}\" -\ "% {DATA:browser_info}\% {GREEDYDATA:extra_info}\"]} geoip {source = > ["client_ip"] Target = > ["geoip"] fields = > ["city_name" "region_name", "country_name", "ip"]} date {match = > ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]} mutate {remove_field = > ["message", "timestamp"]}

Then start it to see the effect:

[root@:vg_adn_tidbCkhsTest:23.22.172.65:172.31.22.29 / etc/logstash/conf.d] # / usr/share/logstash/bin/logstash-f / etc/logstash/conf.d/l9.confSending Logstash logs to / var/log/logstash which is now configured via log4j2.properties112.195.209.90-- [20/Feb/2018:12:12:14 + 0800] "GET / HTTP/1.1" 200190 "-" Mozilla/5.0 (Linux; Android Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36 ""-"{" referrer "= >"\ "GET / HTTP/1.1\", "bytes" = > "190", "client_ip" = > "112.195.209.90", "@ timestamp" = > 2018-02-20T04:12:14.000Z, "browser_info" = > "Mozilla/5.0" "extra_info" = > "(Linux Android 6.0 Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36 "," status "= >" 200", "host" = > "ip-172-31-22-29.ec2.internal", "@ version" = > "1", "geoip" = > {"city_name" = > "Chengdu", "region_name" = > "Sichuan" "country_name" = > "China", "ip" = > "112.195.209.90"}}

The red font above is what we entered manually, and the gold font below is the information fed back to us by the system.

Through the information, we can see that the information has been filtered successfully. Very good.

Note: there is one thing to note: when matching information, the matching mechanism of GREEDYDATA and DATA is different, GREEDYDATA is greedy mode, while DATA can match less and match less. Let's experience it again through the above example.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report