What are the common faults and treatment methods of Linux operation and maintenance? 07/15 Update SLTechnology News&Howtos

What are the common faults and treatment methods of Linux operation and maintenance?

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about the common faults and handling methods of Linux operation and maintenance, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

As Linux operators, they will encounter problems or failures of one kind or another more or less, and sum up experience, find problems, summarize and analyze the causes of the faults, which is a good habit of Linux operation and maintenance engineers. Every technological breakthrough experiences boredom and happiness, but we continue to work hard and accumulate more experience from it, which is the rich reward given to us by practice.

The following is a summary of the possible failures and solutions in the process of my project to see if they resonate with you and help you?

First: FAQ Collection 1.shell script is not executed

Question:

One day, a colleague in R & D asked me to help him take a look at the shell script he wrote. I took a look, the script is very simple, there are no regular errors, reported ": badinterpreter:Nosuchfileordirectory" error.

Seeing this mistake, I asked him if he wrote the script under windows and uploaded it to the linux server. Sure enough.

Reason:

In DOS/windows, the newline character of the text file is rn, while in the nix system, it is n, so the edited text file in DOS/Windows is added to nix with an extra ^ M for each line.

Resolve:

1) rewrite the script under linux

2) vi:%s/r//g:%s/ ^ M / / g (Ctrl+v,Ctrl+m for ^ M input)

Attached: sh-x script file name, which can be executed step by step and echo the results, which is helpful to troubleshoot complex script problems.

2.crontab output result control

Question:

/ var/spool/clientmqueue directory occupies more than 100g of space

Reason:

The program executed in cron has output, which is emailed to cron users, but sendmail does not start, so the files in the / var/spool/clientmqueue directory are generated, which may break the disk over time.

Resolve:

1) manually delete: ls | xargsrm-f

2) complete solution: add > / dev/2 > & 1 after the automatic execution statement of cron

3.telnet is slow / ssh is slow

Question:

One day, a colleague in R & D said that there was an exception in accessing the 10.52memcached service on 10.50. let's check to see if there is anything abnormal in the network / service / system. Check found that the system is normal, the service is normal, 10.50ping10.52 is also normal, but 10.50telnet10.52 is very slow. At the same time, it is found that the namesever of the machine does not work.

Reason:

BecauseyourPCdoesn'tdoareverseDNSlookuponyourIPthen... Whenyoutelnet/ftpintoyourlinuxbox,it'lldoadnslookuponyou .

Resolve:

1) modify / etc/hosts so that hostname and ip correspond

2) comment out nameserver in / etc/resolv.conf or find a "live" nameserver.

4.Read-onlyfilesystem

Question:

Whether a colleague has successfully created a table in mysql is prompted as follows:

Mysql > createtablewosontest (colddname1char (1))

ERROR1005 (HY000): Can't create table'wosontest' (errno:30)

After checking the mysql user rights and related directory permissions, there is no problem; the prompt message with perror30 is: OSerrorcode30:Read-onlyfilesystem

Possible reasons:

1) File system corruption

2) the disk is bad.

3) incorrect configuration of fstab files, such as incorrect partition format (writing ntfs as fat), misspelling of configuration instructions, etc.

Resolve:

1) since it is a test machine, restart the machine and restore it.

2) it is said on the Internet that it can be solved with mount.

5. The file was deleted and the disk space was not released.

Question:

One day, it was found that the disk space used by a certain machine df-h was 90 GB, while du-sh/* showed that all the used space only added up to 30 GB.

Reason:

Maybe someone directly used rm to delete a file that was being written, resulting in the problem that the file was deleted but the disk space was not released.

Resolve:

1) it is the easiest to restart the system or related services.

2) kill the process

/ usr/sbin/lsof | grepdeleted ora25575data33uREG65,654294983680/oradata/DATAPRE/UNDOTBS009.dbf (deleted)

From the output of lsof, we can see that the process with pid 25575 holds the file / oradata/DATAPRE/UNDOTBS009.dbf opened with the file description number (fd) 33.

After we find this file, we can free up the occupied space by ending the process: echo > / proc/25575/fd/33

3) generally use cat/dev/null > file to delete the file being written.

6.find files improve performance

Question:

There are a large number of temporary files containing picture_* in the tmp directory, and files from the day before are cleaned up at 2:30 every night. Previously, I ran the following script under crontab, but found that the script was inefficient and the load soared each time it was executed, affecting other services.

#! / bin/sh find/tmp-name "picture_*"-mtime+1-execrm-f {}

Reason:

There are a large number of files in the directory, and using find consumes a lot of resources.

Resolve:

#! / bin/sh cd/tmp time= `date-d "2dayago"+% b% d" `ls-l | grep "picture" | grep "$time" | awk' {date}'| xargsrm-rf7. Unable to get gateway mac address

Question:

From 2.14 to 3.65 (mapping address 2.141) the network is down, but from other machines on the third side to the 3.65 network OK.

Reason:

The superficial phenomenon of # arp AddressHWtypeHWaddressFlagsMaskIface 192.168.3.254etherincompletCMbond0 is that the machine cannot obtain the gateway MAC address automatically. The network engineer said that it was the problem with the network equipment, but the details were unclear.

Resolve:

Arp binding, arp-ibond0-s192.168.3.25400:00:5e:00:01:64

An example that the 8.http service cannot be started

Question:

One day, a colleague in R & D said that the front-end environment http of the website could not be started, so I went up to have a look. Report an error as follows:

/ etc/init.d/httpdstart Startinghttpd: [SatJan2917:49:002011] [warn] moduleantibot_moduleisalreadyloaded,skipping Useproxyforwardasremoteip:true. Antibotexcludepattern:.*. [(js | css | jpg | gif | png)] Antibotseedcheckpattern:login (98) Addressalreadyinuse:make_sock:couldnotbindtoaddress [:]: 7080 (98) Addressalreadyinuse:make_sock:couldnotbindtoaddress0.0.0.0:7080 nolisteningsocketsavailable,shuttingdown Unabletoopenlog [FAILED]

Reason:

1) Port occupied: on the surface, port 7080 is occupied, so netstat-npl | grep7080 found that 7080 was not occupied.

2) if the port is repeated in the configuration file, if Listen7080 is written in the following two files at the same time

/ etc/httpd/conf/http.conf / etc/httpd/conf.d/t.10086.cn.conf

Resolve:

Comment out / etc/httpd/conf.d/t.10086.cn.conf 's Listen7080, restart, OK.

9.toomanyopenfile

Question:

Report toomanyopenfile error

Resolve:

Ultimate solution

Echo "> > / etc/security/limits.conf echo" * softnproc65535 "> > / etc/security/limits.conf echo" * hardnproc65535 "> > / etc/security/limits.conf echo" * softnofile65535 "> > / etc/security/limits.conf echo" * hardnofile65535 "> > / etc/security/limits.conf echo" > > / root/.bash_profile echo "ulimit-n65535" > > / root/.bash_profile echo "ulimit-u65535" > > / root/.bash_profile

Finally restart the machine or execute:

Disk space problems caused by ulimit-u655345&&ulimit-n6553510.ibdata1 and mysql-bin

Question:

2.51 disk space alarm. After checking, it is found that ibdata1 and mysql-bin logs take up too much space (among them, ibdata1 exceeds 120g, MySQL bin exceeds 80g)

Reason:

Bdata1 is a storage format. In the INNODB type data state, ibdata1 is used to store the data and indexes of files, while the table files in the folder of the library name are just structures.

The innodb storage engine has two ways to manage table spaces, which are:

1) sharing tablespaces (which can be split into multiple small tablespace files), which is the method used by most of our databases at present

2) independent tablespaces, each table has a separate tablespace (disk file)

For the two management methods, each has its own advantages and disadvantages, as follows:

① shared tablespaces:

Advantages:

Tablespaces can be divided into multiple files and stored on different disks (tablespace file size is not limited by table size, and a table can be distributed over unsynchronized files)

Disadvantages:

If all the data and indexes are stored in one file, as the data increases, there will be a large file. Although a large file can be divided into multiple small files, multiple tables and indexes are mixed and stored in the table space. In this way, there will be a lot of gaps in the table space after a large number of deletions are done on a table.

In the case of shared tablespace management, once the tablespace is allocated, it cannot be retracted. When the operation table space of temporary indexing or creating a temporary table is expanded, there is no way to shrink that part of the space even by deleting the related table.

② independent tablespaces:

Set in the configuration file (my.cnf): innodb_file_per_table

Features:

Each table has its own independent tablespace; the data and indexes of each table will exist in its own tablespace.

Advantages:

The disk space corresponding to the tablespace can be reclaimed (the Droptable operation automatically reclaims the tablespace, if the table after a large amount of data has been deleted can be altertabletbl_nameengine=innodb; the unused space.

Disadvantages:

If the single table increases too much, such as more than 100G, the performance will also be affected. In this case, if you use shared tablespaces, you can separate files, but there is also a problem. If the scope of access is too large, it will also access multiple files, which will also be slow.

If you use independent tablespaces, you can consider using partitioned tables to alleviate the problem to some extent. In addition, when independent tablespace mode is enabled, the setting of the innodb_open_files parameter needs to be adjusted reasonably.

Resolve:

1) the ibdata1 data is too large: you can only export the sql statement to build the database through dump, and then rebuild the method.

2) mysql-binLog is too large:

① manually deletes:

Delete a log: mysql > PURGEMASTERLOGSTO'mysql-bin.010'

Delete the log from a certain day: mysql > PURGEMASTERLOGSBEFORE'2010-12-221300purl 00'

② sets only N-day bin-log logs in / etc/my.cnf

Number of days automatically deleted by expire_logs_days=30//BinaryLog

Summary of troubleshooting

After reading the above, do you have any further understanding of the common faults and handling methods of Linux operation and maintenance? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.