How to use logs in Linux to troubleshoot errors 07/19 Update SLTechnology News&Howtos

How to use logs in Linux to troubleshoot errors

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to use the log in Linux to troubleshoot errors", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to use the log in Linux to troubleshoot errors.

Linux Syslog

Many valuable log files are automatically created for you by Linux. You can find them in the / var/log directory. Here is what this directory looks like on a typical Ubuntu system:

Some of the most important Linux system logs include:

/ var/log/syslog or / var/log/messages stores all global system activity data, including boot information. Debian-based systems such as Ubuntu store them in / var/log/syslog, while RedHat-based systems such as RHEL or CentOS store them in / var/log/messages.

/ var/log/auth.log or / var/log/secure stores logs from the pluggable authentication module (PAM), including successful logins, failed login attempts, and authentication methods. Ubuntu and Debian store authentication information in / var/log/auth.log, while RedHat and CentOS store this information in / var/log/secure.

/ var/log/kern stores kernel error and warning data, which is particularly useful for troubleshooting problems related to a custom kernel.

/ var/log/cron stores information about cron jobs. Use this data to ensure that your cron job is running successfully.

Digital Ocean has a complete tutorial on these files, showing how rsyslog creates them in common distributions such as RedHat and CentOS.

The application also writes log files in this directory. Common server programs such as Apache,Nginx,MySQL can write log files in this directory. Some of these log files are created by the application itself, while others are created through syslog (see below).

What is Syslog?

How are Linux Syslog files created? The answer is through the syslog daemon, which listens for log information on the syslog socket / dev/log and then writes them to the appropriate log file.

The word "syslog" represents several meanings and is often used to abbreviate one of the following names:

Syslog daemon-A program used to receive, process, and send syslog information. It can remotely send syslog to a centralized server or write to a local file. Common examples include rsyslogd and syslog-ng. In this way of use, people often say "send to syslog".

Syslog protocol-A transport protocol that specifies how logs are transmitted over the network and a definition of the data format for syslog information (see below). It is formally defined in RFC-5424. For text logs, the standard port is 514 and for encrypted logs, port is 6514. In this way of use, it is often said to be "transmitted through syslog".

Syslog information-Log information or events in syslog format, which includes a header with several standard fields. In this way of use, people often say "send syslog".

The Syslog message or event includes a header with several standard fields, which makes it easier to analyze and route. They include the timestamp, the name of the application, the classification or location of the source of information in the system, and the priority of the event.

Shown below is a log message containing the syslog header from the sshd daemon that controls remote login to the system, which describes a failed login attempt:

The code is as follows:

1 2003-10-11T22:14:15.003Z server1.com sshd-- pam_unix (sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.0.2.2

Syslog format and field

Each syslog message contains a header with fields that are structured data that make it easier to analyze and route events. Here is the format we used to generate the above syslog example, where you can match each value to the name of a specific field.

The code is as follows:

Protocol-version% timestamp:::date-rfc3339% HOSTNAME% app-name% procid% msgid% msg%n

Below, you will see some of the syslog fields that are most commonly used when looking for or troubleshooting:

Time stamp

The timestamp (2003-10-11T22:14:15.003Z in the above example) indicates the time and date on which the information was sent in the system. This time may be different when the information is received on another system. The timestamp in the above example can be broken down into:

2003-10-11, month, day.

T is a required element of the timestamp, which separates the date from time.

The time of the 24-hour system, including the number of milliseconds entering the next second (003).

Z is an optional element, referring to UTC time, and in addition to Z, this example can also include an offset, such as-08:00, which means that the time is offset by 8 hours from UTC, or PST time.

Hostnam

The hostname field (corresponding to server1.com in the above example) refers to the name of the host or the system that sends the message.

Application name

The application name field (corresponding to sshd:auth in the above example) refers to the name of the program that sent the message.

Priority

The priority field or abbreviated to pri (corresponding in the above example) tells us how urgent or severe the event is. It consists of two numeric fields: the device field and the emergency field. The emergency field ranges from the number 7, which represents a debug event, to the number 0, which represents an emergency. The device field describes which process created the event. It ranges from 0, which represents kernel information, to 23, which is used by native applications.

Pri has two output modes. The first is expressed as a separate number, which can be calculated by multiplying the value of the device field by 8, plus the value of the emergency field: (device field) (8) + (emergency field). The second is pri text, which will be output in the string format of "device field. Urgent field". The latter format is easier to read and search, but takes up more storage space.

Use logs in Linux to troubleshoot

Reason for login failure

If you want to check whether your system is secure, you can check the authentication log for failed login and successful but suspicious users. Authentication failures occur when someone logs in through improper or invalid credentials, usually when using SSH for remote login or su to other local users for access. These are recorded by the plug-in verification module (PAM). You will see strings like Failed password and user unknown in your log. The successful authentication record includes strings such as Accepted password and session opened.

Examples of failures:

The code is as follows:

Pam_unix (sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.0.2.2

Failed password for invalid user hoover from 10.0.2.2 port 4791 ssh3

Pam_unix (sshd:auth): check pass; user unknown

PAM service (sshd) ignoring max retries; 6 > 3

Examples of success:

The code is as follows:

Accepted password for hoover from 10.0.2.2 port 4792 ssh3

Pam_unix (sshd:session): session opened for user hoover by (uid=0)

Pam_unix (sshd:session): session closed for user hoover

You can use grep to find out which users have failed logins the most. These are potential attackers trying and accessing failed accounts. This is an example on a ubuntu system.

The code is as follows:

$grep "invalid user" / var/log/auth.log | cut-d'- f 10 | sort | uniq-c | sort-nr

23 oracle

18 postgres

17 nagios

10 zabbix

6 test

Since there is no standard format, you need to use different commands for each application's log. Log management system, you can automatically analyze logs, classify them effectively, and help you extract keywords, such as user names.

The log management system can use the automatic parsing feature to extract the user name from the Linux log. This allows you to see the user's information and filter it through clicks. In the following example, we can see that the root user logged in 2700 times, because the log we filtered shows only the login attempt record of the root user.

The log management system also allows you to view the chart with time as the coordinate axis, making it easier for you to find anomalies. If someone fails to log in once or twice in a few minutes, it may be a real user and forget the password. However, if there are hundreds of failed logins and use different user names, it is more likely to be trying to attack the system. Here, you can see that on March 12, someone tried to log in to Nagios hundreds of times. This is obviously not a legitimate system user.

Reason for restart

Sometimes, a server goes down due to a system crash or restart. How do you know when it happened and who did it?

Shutdown command

If someone runs the shutdown command manually, you can see it in the verification log file. Here, you can see that someone remotely logged in as a ubuntu user from IP 50.0.134.125, and then shut down the system.

The code is as follows:

Mar 19 18:36:41 ip-172-31-11-231 sshd [23437]: Accepted publickey for ubuntu from 50.134.125 port 52538 ssh

Mar 19 18:36:41 ip-172-31-11-231 23437]: sshd [pam_unix (sshd:session): session opened for user ubuntu by (uid=0)

Mar 19 18:37:09 ip-172-31-11-231 sudo: ubuntu: TTY=pts/1; PWD=/home/ubuntu; USER=root; COMMAND=/sbin/shutdown-r now

Kernel initialization

If you want to see all the reasons for the server restart (including crashes), you can look for it in the kernel initialization log. You need to search for kernel classes (kernel) and cpu initialization (Initializing) information.

The code is as follows:

Mar 19 18:39:30 ip-172-31-11-231 kernel: [0.000000] Initializing cgroup subsys cpuset

Mar 19 18:39:30 ip-172-31-11-231 kernel: [0.000000] Initializing cgroup subsys cpu

Mar 19 18:39:30 ip-172-31-11-231 kernel: [0.000000] Linux version 3.8.0-44-generic (buildd@tipua) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)) # 66~precise1-Ubuntu SMP Tue Jul 15 04:01:04 UTC 2014 (Ubuntu 3.8.0-44.66~precise1-generic 3.8.13.25)

Detect memory problems

There are many reasons that can cause a server to crash, but one common reason is running out of memory.

When your system runs out of memory, the process is killed, usually killing the process that uses the most resources. An error occurs when the system uses all memory and new or existing processes try to use more memory. Look in your log file for strings like Out of Memory or kernel warnings like kill. This information indicates that the system intentionally killed the process or application, rather than allowing the process to crash.

For example:

The code is as follows:

[33238.178288] Out of memory: Kill process 6230 (firefox) score 53 or sacrifice child

[29923450.995084] select 5230 (docker), adj 0, size 708, to kill

You can find these logs using tools like grep. This example is in ubuntu:

The code is as follows:

$grep "Out of memory" / var/log/syslog

[33238.178288] Out of memory: Kill process 6230 (firefox) score 53 or sacrifice child

Keep in mind that grep also uses memory, so just running grep can also cause out-of-memory errors. This is another reason why you should store logs centrally!

Scheduled task error log

The cron daemon is a scheduler that runs processes at specified dates and times. If the process fails or fails to complete, then cron's error appears in your log file. Depending on your distribution, you can find this log in / var/log/cron,/var/log/messages and / var/log/syslog. There are many reasons why cron tasks fail. Typically, the problem lies in the process rather than the cron daemon itself.

By default, the output of the cron task is emailed over the postfix. This is a log that shows that the message has been sent. Unfortunately, you can't see the contents of the email here.

The code is as follows:

Mar 13 16:35:01 PSQ110 postfix/pickup [15158]: C3EDC5800B4: uid=1001 from=

Mar 13 16:35:01 PSQ110 postfix/cleanup [15727]: C3EDC5800B4: message-id=

Mar 13 16:35:01 PSQ110 postfix/qmgr [15159]: C3EDC5800B4: from=, size=607, nrcpt=1 (queue active)

Mar 13 16:35:05 PSQ110 postfix/smtp [15729]: C3EDC5800B4: to=, relay=gmail-smtp-in.l.google.com [74.125.130.26]: 25, delay=4.1, delays=0.26/0/2.2/1.7, dsn=2.0.0, status=sent (2502.0.0 OK 1425985505 f16si501651pdj.5-gsmtp)

You can consider logging the standard output of cron to help you locate the problem. This is an example of how you can use the logger command to redirect cron standard output to syslog. Using your script instead of the echo command, helloCron can be set to the name of any application you want.

* / 5 * echo 'Hello World' 2 > & 1 | / usr/bin/logger-t helloCron

The log entries it creates:

The code is as follows:

Apr 28 22:20:01 ip-172-31-11-231 CRON [15296]: (ubuntu) CMD (echo 'Hello Worldwide' 2 > & 1 | / usr/bin/logger-t helloCron)

Apr 28 22:20:01 ip-172-31-11-231 helloCron: Hello World!

Each cron task will log differently according to the specific type of task and how to output data.

At this point, I believe you have a deeper understanding of "how to use the log in Linux to troubleshoot errors". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.