In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
When dealing with all kinds of faults in Linux operating system, the symptoms of the fault are the most easy to find, but the cause of the fault is the key to the final troubleshooting. Being familiar with the common log files in the Linux operating system and understanding the analysis and solutions of general faults will help administrators to quickly locate the fault point, "prescribe the right remedy to the case", and solve various system problems in time.
Blog outline:
First, analyze log files
Second, troubleshoot the system startup
Third, troubleshoot the file system
First, analyze log files
Log files are files used to record various running messages in the Linux operating system, which is equivalent to the "diary" of the Linux host. Different log files record different types of information, such as Linux kernel messages, user login events, program errors, and so on.
Log files are very helpful in diagnosing and resolving problems in the system, because programs running in the Linux operating system usually write system messages and error messages to the corresponding log files, so that the system will be "documented" if something goes wrong. In addition, log files can also help find other traces left by the saboteur when the host is compromised.
1. Primary log file
In the Linux operating system, there are three main types of log data:
Kernel and system logs: this log data is uniformly managed by the system service rsyslog and decides where to log kernel messages and various system program messages according to the settings in its main configuration file / etc/rsyslog.conf. A considerable number of programs in the system will hand over their log files to rsyslog, so the log records used by these programs have a similar format; user logs: this log data is used to record information about Linux operating system users logging in and logging out of the system, including user name, login terminal, login time, source host, process operations in use, etc. Program log: some applications will choose to manage a log file independently (rather than handing it over to the rsyslog service) to record all kinds of event information during the running of the program. Since these programs are only responsible for managing their own log files, the logging formats used by different programs may vary greatly.
The Linux operating system itself and the log files of most server programs are placed in the directory / var/log/ by default. Some programs share a log file, and some programs use a single log file; while some large service programs, because there is more than one log file, will establish a corresponding subdirectory in the / var/log/ directory to store the log files, which not only ensures the clear structure of the log file directory, but also can quickly locate the log files. There are a considerable number of log files that only root users have access to, which ensures the security of the relevant information.
For the log files in the Linux operating system, it is necessary to understand their respective uses, so that we can find the problem more quickly and solve all kinds of problems in time.
Some common log files, such as the figure:
two。 Log file analysis
The purpose of analyzing the log file is to browse the log to find the key information, debug the system service, and determine the cause of the failure.
For most text format log files (such as kernel and system logs, most program logs), you can use tail, more, cat, less and other commands to view, for some special binary log files (such as user logs), you need to use specific query commands.
(1) Kernel and system log
The configuration file used by the rsyslog service is / etc/rsyslog.conf.
[root@localhost ~] # grep-v "^ $" / etc/rsyslog.conf / / filter blank line # rsyslog configuration file# For more information see / usr/share/doc/rsyslog-*/rsyslog_conf.html# If you experience problems See http://www.rsyslog.com/doc/troubleshoot.html#### MODULES # The imjournal module bellow is now used as a message source instead of imuxsock.$ModLoad imuxsock # provides support for local system logging (e.g. Via logger command)... / / omit part of the content
It can be seen from the configuration file that the log files managed by rsyslog service are the main log files in the Linux operating system, which record the basic system messages in the Linux operating system, such as kernel, user authentication, e-mail, scheduled tasks and so on. In the Linux kernel, log messages are divided into different priorities according to their importance (the smaller the number, the higher the priority, and the more important the message). As shown in the figure:
Kernel and most system messages are recorded in common log files / var/log/messages, while other program messages are recorded in separate log files. In addition, log messages can also be recorded to specific storage devices or sent directly to designated users.
For most log files managed uniformly by the rsyslog service, the logging format is basically the same. Take the record format of the public log / var/log/messages file as an example, where each line represents a log message, and each message includes the following four fields:
Time label: date and time when the message was sent; hostname: the name of the computer that generated the message; subsystem name: the name of the application that sent the message; message: the specific content of the message
In some cases, rsyslog can be set up to send the log information to the printer for printing while recording the log information to a file, so that no matter how the network illegal entrant modifies the log, it cannot remove the trace of the log. The rsyslog Log Service is a significant target that is often compromised, which makes it difficult for administrators to detect illegal entry and related information, so pay special attention to monitoring its daemons and configuration files.
(2) user log
In wtmp, btmp, lastlog and other log files, the system user login, logout and other related time messages are saved. However, these files are binary data files, which can not be browsed directly by text viewing tools such as tail and less, but need to use user query commands such as who, w, users, last and lastb to obtain log information.
1) query the current login users-users, who, w commands
The user command simply prints the name of the user currently logged in, with each displayed user name corresponding to a login session. If a user has more than one login session, his user name will be displayed the same number of times. Do the following:
[root@localhost ~] # users (unknown) root root root
The who command is used to report information for each user currently logged into the system. Using this command, the system administrator can see which illegal users exist in the current system, so as to audit and deal with them. The default output of who includes user name, terminal type, login date, and remote host. Do the following:
[root@localhost] # who (unknown): 0 2019-09-10 00:01 (: 0) root tty2 2019-09-10 00:10root pts/0 2019-09-09 16:25 (192.168.1.253) root tty3 2019-09-09 16:42
The w command is used to display each user and the process information he or she is running on the current system, which is richer than the output of users and who commands. The operations are as follows:
[root@localhost] # w 16:49:29 up 48 min, 4 users, load average: 0.00,0.01 0.05USER TTY FROM LOGIN@ IDLE JCPU PCU WHATroot tty2 00:10? 0.84s 0.84s-bashroot pts/0 192.168.1.253 16:25 1.00s 0.10s 0.06s wroot tty3 16:42? 0.08s 0.03s-bash2) query user login history-last, lastb commands
The last command is used to query the user records that successfully logged in to the system, with the most recent login displayed at the top.
[root@localhost] # lastxiaoli tty3 Thu Sep 12 04:49 still logged in root pts/0 192.168.1.253 Thu Sep 12 04:47 still logged in root tty2 Thu Sep 12 04:46 still logged in
The lastb command is used to query the records of users who failed to log in, such as incorrect user names and incorrect passwords.
[root@localhost] # lastbxiaowang tty3 Thu Sep 12 04:52-04:52 (00:00) xiaoli tty3 Thu Sep 12 04:52-04:52 (00:00)
In addition to using the lastb command, you can also view the security log / var/log/secure. To view log files, you can use webalizer and Awstats software to view logs graphically, which is easy to understand!
3) Program log
In the Linux operating system, there are a considerable number of applications that do not use rsyslog services to manage logs, but by the program itself to maintain log records. For example: http website service and so on. The logging format of different application I varies greatly and does not strictly use a uniform format.
As a qualified system manager, we should be vigilant, pay attention to all kinds of suspicious situations at any time, and check all kinds of system log files regularly and randomly, including general information log, network connection log, file transfer log and user login log. When checking these logs, pay attention to whether there is any unreasonable time or operation record.
You need to pay more attention to one of the following situations:
Users log in at unusual times, or the IP address of users logging in to the system is not the same as usual; log records of user login failures, especially those that have repeatedly failed to try to enter the system; illegal use or improper use of superuser privileges; records of unexplained or illegal restart of various network services Abnormal log records, such as incomplete logs, or log files such as wtmp are missing intermediate records for no reason.
In addition, managers need to pay attention to that the log is not completely reliable, smart hackers will clean the scene after entering the system. Therefore, managers need to comprehensively use the above system commands to conduct comprehensive and comprehensive review and testing. Remember not to take it out of context, or you will make a wrong judgment.
Second, troubleshoot the system startup
The startup process of Linux operating system involves MBR, GRUB startup menu, system initialization configuration files and other aspects, any failure of which may lead to abnormal startup of the system, so we must pay attention to the backup of related files.
1.MBR sector failure
The MBR sector consists of three parts:
System bootstrap (GRUB boot menu occupies 446 bytes); partition table (up to four primary partitions, each taking up to 16 bytes); 2 bytes for the end flag of the sector
MBR is located in the first sector of the physical hard disk (512 bytes), which is also called the main boot sector (MBR sector). It not only contains part of the data of the system boot program, but also contains the partition table record of the whole hard disk. When the main boot sector fails, it is very likely that it will not be able to enter the boot to disperse, or can not find the correct partition location and can not load the system, and it is likely to enter the black screen and panic state when booting the host through the hard disk.
We begin the process of backing up, simulating damage, and repairing the MBR sector with the following example:
(1) backup MBR sector data
Because the MBR sector contains the partition table records of the whole hard disk, the backup files of the sector must be stored in other storage devices, otherwise the backup files will not be read during recovery. For example:
[root@localhost ~] # mkdir / backup [root@localhost ~] # mount / dev/sdb1 / backup [root@localhost ~] # dd if=/dev/sda of=/backup/sda.mbr.bak bs=512 count=1// use the dd command to back up the MBR sector data from the first hard disk to the sdb1 partition of the second hard disk
For more information about hard disk partitions, please refer to the blog post: Linux disk and file system management (1)
(2) simulate MBR sector failure [root@localhost ~] # dd if=/dev/zero of=/dev/sda bs=512 count=1// overwrites the original MBR sector data with / dev/zero (infinite zeros) file
When the system is rebooted, a "Operating system not found" message will appear indicating that an available operating system cannot be found, so the host cannot be started.
(3) restore MBR sector data from backup files
Because after the MBR sector is destroyed, you can no longer boot from the hard disk, so you need to boot using an operating system from another hard disk, or boot from the installation CD of the Centos system, no matter how you use it, the purpose is the same-- to get a Shell environment that can execute commands in order to recover the data in the MBR sector from the backup file.
We use the system disk boot as an example, as shown in the figure:
After you have done this, a bash environment with a "sh-4.2#" prompt appears, as shown in the figure:
After completing the restore operation, execute the "exit" command to exit the current temporary shell environment, and the system will restart automatically!
2.GRUB Boot menu failure
GRUB is the default bootstrap used by most Linux operating systems. You can choose to enter a different operating system (if there is another system) through the boot menu, when the configuration file / boot/grub2/grub.cfg is missing, or the critical configuration is wrong, or the bootstrap in MBR is damaged.
Back up the grub configuration file in advance:
[root@localhost ~] # vim / boot/grub2/grub.cfg [root@localhost ~] # mv / boot/grub2/grub.cfg / boot/grub2/grub.cfg.bak// renames the original grub configuration file
After reboot, the Linux host may be prompted with "grub >" after startup, unable to complete the further system startup process. As shown in the figure:
Solution:
Enter first aid mode, as shown in the figure:
After completing the above operations, a bash environment with a "sh-4.2#" prompt appears, as shown in the figure:
When the boot method is CD-ROM boot, it is recommended that you still select hard disk boot after the modification is completed, or you can manually select local hard disk boot according to the prompt for CD boot, as shown in the figure:
After the completion of the operation, the system can start normally!
Note: because CentOS7 uses grub2, the configuration file has a lot of changes with grub, so be sure to back up grub.cfg for recovery.
To forget the root password, please refer to the blog post: necessary measures taken to forget the root password of the Linux system.
Whether it is MBR sector failure, GRUB boot menu or forgetting root password related operations, you can enter the first aid mode to repair.
Third, troubleshoot the file system
The value of the data stored in the file system and disk is inestimable, and one of the duties of the administrator is to ensure the security of the data. Since the disk is a fragile product, it is impossible to predict when it will be damaged, so the best way is to establish a complete backup mechanism. You don't have to be careful when there is a file system or disk failure on the system.
1. Repair the file system
In the Linux host, the file system may be corrupted due to abnormal shutdown, sudden power outage, abnormal reading and writing of device data, and so on. The more common is super fast damage. Super Fast is the core "file" of the file system, which records the type, size, free disk blocks and other information of the file system.
When the super block data of a file system is corrupted, Linux will not recognize the file system and an error will appear during mount so that it cannot be used properly. Do the following to destroy the file's super block database. The command is as follows:
[root@localhost ~] # dd if=/dev/zero of=/dev/sdb1 bs=512 count=4 recorded 4'0 read in, 4'0 write out 2048 bytes (2.0 kB) replicated, 0.000868901 seconds, 2.4 MB/ seconds [root@localhost ~] # mkdir / a [root@localhost ~] # mount / dev/sdb1 / amount: / dev/sdb1 write protection Mount will be mounted read-only: unknown file system type "(null)" [root@localhost ~] # vim / etc/fstab... / / omit part of the content / dev/sdb1 / a xfs defaults 01Candle / realize automatic mount
After rebooting the system, the following error occurs, as shown in the figure:
Repair completed!
two。 Disk resource exhaustion failure
It is obvious that when a file system runs out of disk space, it will not be possible to continue to create new file data in that partition, resulting in a failure.
When you cannot boot into the Linux operating system due to insufficient disk space in the root partition, you can enter first aid mode to clean up files that take up a lot of space. You can use the command "dd if=/dev/zer0 of=/a bs=1M count=999999" to simulate a failure.
In addition, in each file system, the number of files that can be used (corresponding to the number of I nodes) is also limited. When a file system is formatted, the number of I nodes is fixed. If the user intentionally consumes the number of I nodes, then, even if the partition has a lot of space, files cannot be created.
Learn about this through an example:
(1) simulate I node exhaustion failure [root@localhost ~] # mkdir / a [root@localhost ~] # mount / dev/sdb1 / a [root@localhost ~] # df-I / a file system Inode has been used (I) available (I) used (I)% mount point / dev/sdb1 10485248 3 10485245 / a
Write a small script that consumes the number of I nodes. The script content is as follows:
[root@localhost ~] # vim a.shangxinxanyanyanyanyashiy1while [$I-le 310485245] dotouch / a/file$ilet i++done [root@localhost ~] # sh a.sh & [root@localhost ~] # df-I / a file system Inode is used (I) available (I) used (I)% mount point / dev/sdb1 10485248 310485245 / a [root@localhost ~] # touch / a/newfiletouch: none Method to create "/ a/newfile": there is no space on the device [root@localhost ~] # df-hT / a file system type capacity used available mount point / dev/mapper/cl-root xfs 17G 4.5G 13G 26% / (2) repair I node failure [root@localhost ~] # rm-rf / a/file*3. Detect the bad track of hard disk
Disk bad track is divided into logical bad track and physical bad track, the former is caused by improper software operation, can be repaired by software repair tools, while the latter is physical damage, can only be improved by changing the occupied position of disk partitions or sectors, so as to exclude the disk space containing bad blocks, if the disk appears the following phenomenon, it is possible that the disk has a bad path, which needs to be detected and repaired.
When reading the data in the disk, the disk device makes an abnormal noise; when accessing a file in the disk, it reads repeatedly and makes an error, indicating that the file is corrupted; the newly created partition cannot be formatted; when the system uses the disk, it crashes frequently. After the occurrence of bad channels in this case, if the bad channels are not replaced or technically dealt with in time, there will be more and more bad channels, which may lead to frequent crashes and data loss, so the disk should be checked regularly if necessary to see if there are any bad channels.
In the Linux system, the badblocks command can be used to detect the bad condition of the disk, combined with "- s" to display progress information, and the "- v" option is used to display details.
[root@localhost ~] # badblocks-sv / sdb/sdb
In the process of long-term use of the computer, file system and disk failures are difficult to avoid completely. The repair of such failures needs to be very careful, and improper operation may aggravate the degree of data destruction. When it is found that there are bad channels in the disk, the application service in the system should be stopped as soon as possible, the relevant data should be backed up, and if necessary, the disk on one side of the system should be shut down immediately to further spread, so as to avoid causing greater losses. For hard disk devices with bad channels, other good hard drives should be used for replacement.
-this is the end of this article. Thank you for reading-
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.