In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Grok is the most important plug-in for Logstash. You can predefine a named regular expression in grok and reference it later (in the grok parameter or other regular expressions). It works well with syslog logs,apache and some other webserver logs, as well as mysql logs. Grok has a lot of well-defined pattern, and of course you can define it yourself.
The syntax of grok:
% {SYNTAX:SEMANTIC}
SYNTAX represents a grok-defined pattern,SEMANTIC represents a custom field.
For example, 192.168.0.100
IP can be defined as client with% {IP:client}
If the content in a webserver log is in the following format
55.3.244.1 GET / index.html 15824 0.043
We can use grok to define this information as the following fields
{IP:client}% {WORD:method}% {URIPATHPARAM:request}% {NUMBER:bytes}% {NUMBER:duration}
Writing to a configuration file usually looks like this:
Input {file {path = > "/ var/log/http.log"} filter {grok {match = > {"message" = > "% {IP:client}% {WORD:method}% {URIPATHPARAM:request}% {NUMBER:bytes}% {NUMBER:duration}"}
The information filtered by grok looks like this
Client: 55.3.244.1
Method: GET
Request: / index.html
Bytes: 15824
Duration: 0.043
How to customize Pattern?
Grammar: (the pattern here)
If there are the following "
Begin 123.456 end
We want to define 123.456 as a request_time field, and we can write this regular expression as follows
\ s + (?\ d + (?:\.\ d +)?)\ s +
Explanation:
\ s: matches any invisible characters, including spaces, tabs, page breaks, and so on. Equivalent to [\ f\ n\ r\ t\ v]. + indicates that the number of matches is one or more times
(?): this is the grok syntax, and request_time represents the field name of the captured character to be defined.
\ dnumbers: match one or more digits
(?:\.\ d +): regular expression
(?: pattern): does not get the match, matches the pattern but does not get the match result, and does not store it for later use. This is useful when using the or character "(|)" to combine parts of a pattern. For example, "industr (?: y | ies)" is a simpler expression than "industry | industries".
\.\ dcards: indicates that the dot is followed by one or more numbers, (?:\.\ d +)? Indicates that the dot is followed by one or more numbers for 0 or more times, and if it is 0 times, request_time is an integer. So the matching result may be 123.456 or 123.4.5.6, all of which meet the conditions.
Test:
Create a configuration file with the following contents:
Input {stdin {} filter {grok {match = > {"message" = > "\ s + (?\ d + (?:\.\ d +)?)\ s +"}} output {stdout {}
Run the logstash process and type "begin 123.456 end", and you will see output similar to the following:
{"message" = > "begin 123.456 end", "@ version" = > "1", "@ timestamp" = > "2014-08-09T11:55:38.186Z", "host" = > "raochenlindeMacBook-Air.local", "request_time" = > "123.456"}
Exercise:
The information obtained in the / var/log/userlog.info log file is in the following format Need to customize 2016-05-20T20:00:15.703407+08:00 localhost [audit root/13283 as root/13283 on pts/0/172.16.100.99:64790- > 10.10.10.6 20T20:00:15.703407+08:00 localhost 22]: # = = session closed = 2016-05-21T09:52:54.424055+08:00 localhost [audit root/13558 as root/13558 on pts/0/172.16.100.99:50897- > 10.10.10.6 session closed 22]: # = = session opened = = 2016-05-21T09:53 25.687134508 localhost [audit root/13558 as root/13558 on pts/0/172.16.100.99:50897- > 10.10.10.6 root 22] / root: cd / etc/logstash/conf.d/2016-05-21T09:53:26.284741+08:00 localhost [audit root/13558 as root/13558 on pts/0/172.16.100.99:50897- > 10.10.10.6 root 22] / etc/logstash/conf.d: ll
Note that not every line in the log file above has the same format. The grok expression is as follows
% {TIMESTAMP_ISO8601:timestamp}% {IPORHOST:login_host}\ [\ S+% {USER:login_user} /% {NUMBER:pid} as% {USER:sudouser} /% {NUMBER:sudouser_pid} on% {WORD:tty} /% {NUMBER:tty_id} /% {IPORHOST:host_ip}:% {NUMBER:source_port} -\ >% {IPORHOST:local_ip}:% {NUMBER:dest_port}\] (?:\:) (% { UNIXPATH:current_path}% {GREEDYDATA:command} |% {GREEDYDATA:detail}) Note: (?:\:) (% {UNIXPATH:current_path}% {GREEDYDATA:command} |% {GREEDYDATA:detail})
The contents of the above log are different in the back.
How to deal with it? There may or may not be colons. The colons must be matched. The regular expression is as follows
(?:\: |) can you also write (\:)? Indicates that the colon appears one or more times
When there is no colon and the following information is the following
/ root: cd / etc/logstash/conf.d/
Grok wrote like this.
{UNIXPATH:current_path} {GREEDYDATA:command}
When it appears
# session closed = =
When it comes to such content, grok writes like this
% {GREEDYDATA:detail}
But these two cases need to be judged by "|", or, so the correct way to write it is
(% {UNIXPATH:current_path}% {GREEDYDATA:command} |% {GREEDYDATA:detail})
Note that if the order of the grok statements above is adjusted to the following
(% {GREEDYDATA:detail} |% {UNIXPATH:current_path}% {GREEDYDATA:command})
There will be a problem, and it will
/ root: cd / etc/logstash/conf.d/
All match into detail.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.