Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Analysis of Common cases of Logstash Grammar (2)

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

Abstract

This article is mainly about the Filter plug-in, which has dealt with various examples of nginx logs.

Let's go on to talk about plug-ins.

1PowerFilter plugin

Grok: regular capture

Date: time processing

Mutate: data modification

Geoip: query classification

JSON: codec

Grok: parse and structure any text.

Http://grokdebug.herokuapp.com/patterns# matching rules, pay attention to spaces, if the spaces do not match will also report an error

Http://grokdebug.herokuapp.com/ match check and syntax hint

Grok is currently the best way for logstash to parse unstructured log data into structured and queriable data. Logstash has built-in 120 matching patterns to meet most of the needs.

Format:

Filter {grok {match = > {"message" = > "grok_pattern"}

Note:

Grok_pattern consists of zero or more% {SYNTAX:SEMANTIC}, where SYNTAX is the name of the expression and is provided by grok, for example, the name of a numeric expression is NUMBER,IP address expression is IP. SEMANTIC represents the name of the parsed character, which is defined by yourself. For example, the name of the IP field can be client.

Simple example:

# cat conf.d/test.conf input {stdin {} # input method is standard input filter {grok {# grok plug-in match # patterns_dir = > "/ path/to/patterns" # write matching rules to the specified file for easy management match = > {"message" = > "% {WORD}% {NUMBER:request_time:float}% {WORD}"} # WORD matching string NUMBER matches numeric values and supports int,float format. The matching value is assigned to the request_time variable # remove_field = > ["message"] # processing result deletes message field}} output {# output mode is standard output stdout {codec= > rubydebug} # defines the output format as rubydebug}

Results:

#. / bin/logstash-f conf.d/test.confLogstash startup completedbegin 123.456 end {"message" = > "begin 123.456 end", which will not be displayed after # remove. "@ version" = > "1", "@ timestamp" = > "2016-05-09T02:43:47.952Z", "host" = > "dev-online", "request_time" = > 123.456 # variables added in grok match}

Nginx log processing matches:

Because the nginx log has been processed into json data, it is transferred to key:value and printed in rubydebug format as follows:

So now you want to filter the unwanted fields.

Input {file {path = > "/ var/log/nginx/access.log" type = > "json" codec = > "json" start_position = > "beginning"}} filter {grok {match = > {"@ timestamp" = > "% {WORD}" # first match the unwanted fields "type" = > "% {WORD}"} remove_field = > ["@ timestamp" "type"] # remove the field}} output {stdout {codec= > rubydebug}}

Running result:

Nginx log json format:

Log_format json'{"@ timestamp": "$time_iso8601",''"@ version": "1",''"host": "$server_addr",''"client": "$remote_addr",''"size": $body_bytes_sent '' "responsetime": $request_time,''"domain": "$host", "url": "$request", "refer": "$http_referer",''"agent": "$http_user_agent" '' "status": "$status"}' Access_log / var/log/nginx/access.log json

Common regular matching parameters for nginx configuration files

Nginx log format matches project notes

$remote_addr {IPORHOST:clientip}

$remote_user {NOTSPACE:remote_user}

[$time_local]\ [% {HTTPDATE:timestamp}\] "[]" needs to be a special character and needs to be escaped.

"$request"% {WORD:method} access request, usually with"and when matching. WORD matches GET,POST"

Method {URIPATHPARAM:request} URIPATHPARAM matches the requested uri

HTTP% {NUMBER:httpversion} "NUMBER matches numbers and assigns values to the httpversion http protocol version

$status% {NUMBER:status} NUMBER matches the number and assigns it to status as the return status

$body_bytes_sent {NUMBER:response} content size

"$http_referer"{QS:referrer}" matches the request refer

"$http_user_agent"{QS:agent}" matches the cordial agent

"$http_x_forwarded_for"{QS:xforwardedfor}" matches xfw

$upstream_addr% {IPV4:upstream}:% {POSINT:port}

$scheme {WORD:scheme} matches http or https

Eg: nginx log format:

Log_format access'$remote_addr-$remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer"$http_user_agent"'

Log instance:

"192.168.1.22-- [20/Apr/2016:16:28:14 + 0800]" GET / ask/232323.html HTTP/1.1 "15534" http://test.103.100xhs.com/"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"

Matching rule

"% {IPORHOST:clientip} -% {NOTSPACE:remote_user}\ [% {HTTPDATE:timestamp}\]" {WORD:method}% {URIPATHPARAM:request} HTTP/% {NUMBER:httpversion} "% {NUMBER:status}% {NUMBER:response}% {QS:referrer}% {QS:agent}"

Geoip address query:

GeoIP is the most common free IP address classification query library. GeoIP library can provide corresponding regional information according to IP address, including country, province, city, longitude and latitude, etc., which is very useful for visual maps and regional statistics.

Input {stdin {} filter {geoip {source = > "message" # source must be a public network ip or geoip will not display data # fields = > ["city_name", "country_code2", "country_name", "latitude", "longitude"]} # geoip output has more content, you can specify the output column} output {stdout {codec= > rubydebug}}

Find the key of the corresponding IP is the value specified by source in geoip.

A complete example:

Results:

Note: the "source" field of the geoip plug-in can be any processed field, such as "client_ip", but the field content requires

Watch out! Only the IP information on the public network is stored in the geoip library. If the query cannot find the result, null will be returned directly.

JSON:

Input {stdin {} filter {json {source = > "message" # required}} output {stdout {codec= > rubydebug}}

Results:

{"name": "wd", "age": "15"} {"message" = > "{\" name\ ":\" wd\ ",\" age\ ":\" 15\ "}", "@ version" = > "1", "@ timestamp" = > "2016-05-09T06:32:13.546Z", # plus a timestamp is convenient for kibana to import "host" = > "dev-online", "name" = > "wd", "age" = > "15"}

Date event handling

Note: since% {+ YYYY.MM.dd}, which is commonly used in outputs/elasticsearch later, must read @ timestamp data, do not delete this field directly to keep its own field, but should delete its own field after filters/date conversion!

Filter {grok {match = > ["message", "% {HTTPDATE:logdate}"]} date {match = > ["logdate", "dd/MMM/yyyy:HH:mm:ss Z"]}}

Note: only one letter Z is needed for the time zone offset.

Mutate data modification

1, type conversion

The conversion types that can be set include: "integer", "float" and "string". Examples are as follows

Filter {mutate {convert = > ["request_time", "float"]}}

Note: in addition to converting simple character values, mutate also supports the conversion of fields of array types, that is, converting ["1", "2"] to [1line 2]. However, similar processing for fields of hash type is not supported. Those with this need can use the filters/ruby plug-in described later.

2, string processing

Gsub is only valid for string type fields

Gsub = > ["urlparams", "[\\? #]", "_"]

Split

Split = > ["message", "|"]

Enter a string of characters separated by | at random, such as "123 | 321 | adfd | dfjld*=123", and you can see the following output:

Join is valid only for array type fields

We have previously used the basis of split cutting and then join back. The configuration is changed to:

Join = > ["message", ","]

Merge merges two arrays or hash fields. Continue to build on the previous split:

Merge = > ["message", "message"]

Rename renames a field. If the destination field already exists, it will be overwritten:

Rename = > ["syslog_host", "host"]

Update updates the contents of a field. If the field does not exist, it will not be created.

Replace acts like update, but when a field does not exist, it automatically adds a new field, acting like the add_field parameter.

Codec coding plug-in

Json: enter the predefined JSON data directly, so you can omit the filter/grok configuration!

Path = > "/ var/log/nginx/access.log_json"

Codec = > "json"

Multiline: merging multiple rows of data

Stdin {codec = > multiline {pattern = > "^\ [" negate = > true what = > "previous"}}

Terminal input: ends with the end, line feeds cannot end.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report