Basic course of Linux Syslog Analysis 07/08 Update SLTechnology News&Howtos

Basic course of Linux Syslog Analysis

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "the basic course of Linux system Log Analysis". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Linux Syslog

Many valuable log files are automatically created for you by Linux. You can find them in the / var/log directory. Here is what this directory looks like on a typical Ubuntu system:

Some of the most important Linux system logs include:

/ var/log/syslog or / var/log/messages stores all global system activity data, including boot information. Debian-based systems such as Ubuntu store them in / var/log/syslog, while RedHat-based systems such as RHEL or CentOS store them in / var/log/messages.

/ var/log/auth.log or / var/log/secure stores logs from the pluggable authentication module (PAM), including successful logins, failed login attempts, and authentication methods. Ubuntu and Debian store authentication information in / var/log/auth.log, while RedHat and CentOS store this information in / var/log/secure.

/ var/log/kern stores kernel error and warning data, which is particularly useful for troubleshooting problems related to a custom kernel.

/ var/log/cron stores information about cron jobs. Use this data to ensure that your cron job is running successfully.

Digital Ocean has a complete tutorial on these files, showing how rsyslog creates them in common distributions such as RedHat and CentOS.

The application also writes log files in this directory. Common server programs such as Apache,Nginx,MySQL can write log files in this directory. Some of these log files are created by the application itself, while others are created through syslog (see below).

What is Syslog?

How are Linux Syslog files created? The answer is through the syslog daemon, which listens for log information on the syslog socket / dev/log and then writes them to the appropriate log file.

The word "syslog" represents several meanings and is often used to abbreviate one of the following names:

Syslog daemon-A program used to receive, process, and send syslog information. It can remotely send syslog to a centralized server or write to a local file. Common examples include rsyslogd and syslog-ng. In this way of use, people often say "send to syslog".

Syslog protocol-A transport protocol that specifies how logs are transmitted over the network and a definition of the data format for syslog information (see below). It is formally defined in RFC-5424. For text logs, the standard port is 514 and for encrypted logs, port is 6514. In this way of use, it is often said to be "transmitted through syslog".

Syslog information-Log information or events in syslog format, which includes a header with several standard fields. In this way of use, people often say "send syslog".

The Syslog message or event includes a header with several standard fields, which makes it easier to analyze and route. They include the timestamp, the name of the application, the classification or location of the source of information in the system, and the priority of the event.

Shown below is a log message containing the syslog header from the sshd daemon that controls remote login to the system, which describes a failed login attempt:

1 2003-10-11T22:14:15.003Z server1.com sshd-- pam_unix (sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.0.2.2

Syslog format and field

Each syslog message contains a header with fields that are structured data that make it easier to analyze and route events. Here is the format we used to generate the above syslog example, where you can match each value to the name of a specific field.

The code is as follows:

Protocol-version% timestamp:::date-rfc3339% HOSTNAME% app-name% procid% msgid% msg%n

Below, you will see some of the syslog fields that are most commonly used when looking for or troubleshooting:

Time stamp

The timestamp (2003-10-11T22:14:15.003Z in the above example) indicates the time and date on which the information was sent in the system. This time may be different when the information is received on another system. The timestamp in the above example can be broken down into:

2003-10-11, month, day.

T is a required element of the timestamp, which separates the date from time.

The time of the 24-hour system, including the number of milliseconds entering the next second (003).

Z is an optional element, referring to UTC time, and in addition to Z, this example can also include an offset, such as-08:00, which means that the time is offset by 8 hours from UTC, or PST time.

Hostnam

The hostname field (corresponding to server1.com in the above example) refers to the name of the host or the system that sends the message.

Application name

The application name field (corresponding to sshd:auth in the above example) refers to the name of the program that sent the message.

Priority

The priority field or abbreviated to pri (corresponding in the above example) tells us how urgent or severe the event is. It consists of two numeric fields: the device field and the emergency field. The emergency field ranges from the number 7, which represents a debug event, to the number 0, which represents an emergency. The device field describes which process created the event. It ranges from 0, which represents kernel information, to 23, which is used by native applications.

Pri has two output modes. The first is expressed as a separate number, which can be calculated by multiplying the value of the device field by 8, plus the value of the emergency field: (device field) (8) + (emergency field). The second is pri text, which will be output in the string format of "device field. Urgent field". The latter format is easier to read and search, but takes up more storage space.

Analyze Linux Lo

There is a lot of information in the log that you need to deal with, although sometimes it is not as easy as you might think. In this article we will introduce some examples of basic log analysis that you can do now (just search). We will also cover some more advanced analysis, but these will require your early efforts to make appropriate settings, which will save a lot of time later. Examples of advanced analysis of data include generating summary counts, filtering valid values, and so on.

We will first show you how to use several different tools on the command line, and then show how a log management tool can automate most of the heavy work to make log analysis easier.

Search with Grep

Searching for text is the most basic way to find information. The most common tool for searching text is grep. This command-line tool is available in most Linux distributions and allows you to search logs with regular expressions. A regular expression is a pattern written in a special language that can recognize matching text. The simplest pattern is to enclose the string you are looking for in quotation marks.

Regular expression

This is an example of looking for "user hoover" in the authentication log of a Ubuntu system:

The code is as follows:

$grep "user hoover" / var/log/auth.log

Accepted password for hoover from 10.0.2.2 port 4792 ssh3

Pam_unix (sshd:session): session opened for user hoover by (uid=0)

Pam_unix (sshd:session): session closed for user hoover

It can be difficult to build accurate regular expressions. For example, if we want to search for a number like port "4792", it may also match timestamps, URL, and other unwanted data. The following example in Ubuntu matches an Apache log that we don't want.

The code is as follows:

$grep "4792" / var/log/auth.log

Accepted password for hoover from 10.0.2.2 port 4792 ssh3

74.91.21.46-[31/Mar/2015:19:44:32 + 0000] "GET / scripts/samples/search?q=4972 HTTP/1.0" 404 545 "-"-"

Surround search

Another useful tip is that you can use grep to do surround search. This will show you what the lines before or after a match are. It can help you debug things that cause errors or problems. Option B shows the first few lines, and option A shows the next few lines. For example, we know that when someone fails to log in as an administrator, their IP is not resolved in reverse, which means they may not have a valid domain name. This is very suspicious!

The code is as follows:

$grep-B 3-A 2 'Invalid user' / var/log/auth.log

Apr 28 17:06:20 ip-172-31-11-241 sshd [12545]: reverse mapping checking getaddrinfo for 21619-2-8.commspeed.net [216.19.2.8] failed-POSSIBLE BREAK-IN ATTEMPT!

Apr 28 17:06:20 ip-172-31-11-241 sshd [12545]: Received disconnect from 216.19.2.8: 11: Bye Bye [preauth]

Apr 28 17:06:20 ip-172-31-11-241 sshd [12547]: Invalid user admin from 216.19.2.8

Apr 28 17:06:20 ip-172-31-11-241 sshd [12547]: input_userauth_request: invalid user admin [preauth]

Apr 28 17:06:20 ip-172-31-11-241 sshd [12547]: Received disconnect from 216.19.2.8: 11: Bye Bye [preauth]

Tail

You can also use grep with tail to get the last few lines of a file, or to track logs and print them in real time. This is useful when you make interactive changes, such as starting the server or testing code changes.

The code is as follows:

$tail-f / var/log/auth.log | grep 'Invalid user'

Apr 30 19:49:48 ip-172-31-11-241 sshd [6512]: Invalid user ubnt from 219.140.64.136

Apr 30 19:49:49 ip-172-31-11-241 sshd [6514]: Invalid user admin from 219.140.64.136

A detailed introduction to grep and regular expressions is beyond the scope of this guide, but Ryan's Tutorials has a more in-depth introduction.

Log management system has higher performance and stronger search ability. They usually index data and query in parallel, so you can quickly search GB or TB logs in seconds. Grep, by contrast, takes minutes and, in extreme cases, even hours. The log management system also uses a query language similar to Lucene, which provides a simpler syntax to retrieve numbers, fields, and others.

Parsing with Cut, AWK, and Grok

Linux provides several command-line tools for text parsing and analysis. It's useful when you want to parse a small amount of data quickly, but it can take a long time to process a large amount of data.

Cut

The cut command allows you to parse fields from delimited logs. A delimiter is an equal sign or comma that can separate fields or key-value pairs.

Suppose we want to resolve the user from the following log:

The code is as follows:

Pam_unix (su:auth): authentication failure; logname=hoover uid=1000 euid=0 tty=/dev/pts/0 ruser=hoover rhost= user=root

We can use the cut command to get the text of the eighth field split with an equal sign as follows. This is an example on a Ubuntu system:

The code is as follows:

$grep "authentication failure" / var/log/auth.log | cut-d'='- f 8

Root

Hoover

Root

Nagios

AWK

In addition, you can also use awk, which provides more powerful parsing of fields. It provides a scripting language that you can filter out almost anything irrelevant.

For example, suppose we have the following line of log on the Ubuntu system, and we want to extract the user name for which the login failed:

The code is as follows:

Mar 24 08:28:18 ip-172-31-11-241 sshd [32701]: input_userauth_request: invalid user guest [preauth]

You can use the awk command as follows. First, a regular expression / sshd.*invalid user/ is used to match the sshd invalid user line. Then use {print $9} to print the ninth field based on the default delimiter space. This outputs the user name.

The code is as follows:

$awk'/ sshd.*invalid user/ {print $9}'/ var/log/auth.log

Guest

Admin

Info

Test

Ubnt

You can read more about how to use regular expressions and output fields in the Awk user's Guide.

Log management system

Log management system makes parsing easier, so that users can quickly analyze a lot of log files. They can automatically parse standard log formats, such as common Linux logs and Web server logs. This saves a lot of time because you don't have to think about writing your own parsing logic when dealing with system problems.

The following is an example of an sshd log message that parses each remoteHost and user. This is a screenshot from Loggly, a cloud-based log management service.

You can also customize parsing for non-standard formats. A common tool is Grok, which uses a library of common regular expressions to parse raw text into structured JSON. Here is a case configuration of Grok parsing kernel log files in Logstash:

The code is as follows:

Filter {

Grok {

Match = > {"message" > "% {CISCOTIMESTAMP:timestamp}% {HOST:host}% {WORD:program}% {NOTSPACE}% {NOTSPACE}% {NUMBER:duration}% {NOTSPACE}% {GREEDYDATA:kernel_logs}"

}

The following figure shows the output of parsed Grok:

Filter with Rsyslog and AWK

Filtering allows you to retrieve a specific field value instead of full-text search. This makes your log analysis more accurate because it ignores unnecessary matches from other parts of the log information. In order to search for a field value, you first need to parse the log or at least have a way to retrieve the event structure.

How to filter applications

Usually, you may only want to see the log of an application. It will be easy if your application saves all the records in one file. It can be complicated if you need to filter an application in an aggregate or centralized log. There are several ways to do this:

Use the rsyslog daemon to parse and filter the logs. The following example writes the log of the sshd application to a file called sshd-message, and then discards the event so that it does not repeat itself elsewhere. You can add it to your rsyslog.conf file to test this example.

The code is as follows:

: programname, isequal, "sshd" / var/log/sshd-messages

& ~

Use a command-line tool like awk to extract the value of a specific field, such as the sshd user name. The following is an example of a Ubuntu system.

The code is as follows:

$awk'/ sshd.*invalid user/ {print $9}'/ var/log/auth.log

Guest

Admin

Info

Test

Ubnt

Use the log management system to automatically parse the log, and then click filter on the desired application name. The following is a screenshot of extracting the syslog domain in the Loggly log management service. We filter the application name "sshd", as shown in the Venn diagram icon.

How to filter errors

A person most wants to see errors in the log. Unfortunately, the default syslog configuration does not directly output the severity of errors, making them difficult to filter.

Here are two ways to solve the problem. First, you can modify your rsyslog configuration to output the severity of the error in the log file to make it easy to view and retrieve. You can use pri-text to add a template in your rsyslog configuration, like this:

The code is as follows:

":% timegenerated%,%HOSTNAME%,%syslogtag%,%msg%n"

This example will be output in the following format. You can see the err that indicates the error in this message.

The code is as follows:

: Mar 11 18 pam_authenticate 18 00 Authentication failure hooverlue VirtualBox Magi su [5026]:

You can use awk or grep to retrieve error messages. In Ubuntu, for this example, we can use some grammatical features, such as. And >, they only match this domain.

The code is as follows:

$grep '.err >' / var/log/auth.log

: Mar 11 18 pam_authenticate 18 00 Authentication failure hooverlue VirtualBox Magi su [5026]:

Your second option is to use a log management system. A good log management system can automatically parse syslog messages and extract error domains. They also allow you to filter specific errors in log messages with simple clicks.

The following is a screenshot from Loggly, showing the syslog domain of the highlighted error severity, indicating that we are filtering the error:

This is the end of the "basic tutorial on Linux Syslog Analysis". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.