Detection of bad tracks and blocks on hard disk under linux system 04/17 Update SLTechnology News&Howtos

Detection of bad tracks and blocks on hard disk under linux system

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Disk bad track detection

When the disk occurs as follows:

Io wait increases or stays high for no reason

The sound of the hard disk suddenly changed from the original sassafras to a strange sound.

The system cannot start normally, and prompts such as "IO error" appear.

Mkfs, to a certain progress stagnant, and finally reported an error, unable to complete

Every time the system boots, it will run fsck scan disk error

When fdisk the disk, there will be repeated ups and downs at a certain progress.

If the above situation occurs, it is necessary to detect the bad track of the disk and test the availability of the disk in time, and back up the data immediately.

1. Bad track of hard drive found

When there is a bad hard disk in dmesg, there is usually Buffer I Error O Error in the output of dmesg, so often check the output of dmesg to find out whether there is a hard disk problem in time.

two。 Detection of bad channels

2.1 View and display information on all disks or flash memory through fdisk

# fdisk-l / dev/sd*

2.2 use badlocks to check for bad tracks / blocks on the linux hard drive. Bad tracks can also be repaired, but only logical ones can be repaired, and hard drives can only be replaced by physical ones.

# badblocks-s-v / dev/sdg > badsectors.txtChecking blocks 0 to 20970495Checking for badblocks (read-only test): done Pass completed, 0 badblocks found. (0ax 0max 0 errors)

Note: the-v option allows it to display the operation details

-s shows progress during inspection

-o writes the results of the check to the specified output file

-w performs a write test when checking.

You can also check for individual partitions.

3. Bad track type of hard disk

Hard disk bad track is divided into physical bad channel and logical bad channel.

Physical bad way: even if the hard disk entity has bad places, the physical bad way is recommended to change the hard disk, of course, there is also a way to re-partition to isolate the bad road, but it may not be used soon, so it is not recommended.

Bad logic: it is because the check information (ECC) on the disk track does not match the track data. The reason for this failure is usually due to the misoperation of some programs or the beginning of instability of the magnetic media in the sector. Physical bad Tao is also one of the causes of logical bad Tao.

4. Repair the bad track of the disk

First of all, when a bad path is detected, we should first check whether the light of the server disk has an alarm. In general, the server has a hard disk alarm light, and the red light indicates that the disk is not working. This is obviously a physical problem and needs to be replaced.

Secondly, if the hard disk lamp does not give an alarm, or the hard drive has been changed, but there is still a bad road in the check, it may be a logical bad way at this time, and you need to try to repair it. If you can repair it, it is indeed a logical bad way, if it cannot be repaired, it is a physical bad way.

4.1 logical bad channel repair mode

View the bad road information checked out by the above partition:

# tail-f badsectors.txt205971590205971591205971592205971593205971594205971595

4.1.1 back up data first

If the important data of the repaired hard disk or partition has been backed up, this part can be omitted

# dd if=/dev/sdg skip=205971590 of=/tmp/205971590-205971595.dat count=5

4.1.2 repair disk

The hard disk cannot be repaired when in use, otherwise there may be a write concurrency problem, so the corresponding partition of umount is required before repair (if it is the partition of the system, it cannot be repaired online, because it cannot be umount).

# umount / data02 but umount may have a "Device busy" error because a program is using this partition and needs to shut down all these processes. Use fuser (the command is as follows), where / data02 is the mount directory for the partition.

# fuser-m / data02

# fuser-m-v-I-k / data02 the first fuser command lists the processes using / data02, and the second lists PID and kill processes (with prompt confirmation). It is recommended to use the first command to list PID, and then do not blindly kill processes to see which types of processes they are.

After the umount partition is successful, the repair command is as follows, where-s indicates progress,-w indicates write repair, followed by end (END) and start (START) block numbers. Note that END comes first and START comes after.

# badblocks-s-w / dev/sdg 205971590 205971595

Or check again after repair.

# badblocks-s-v / dev/sdg 205971590 205971595

4.1.3 check the recovery again

Recover data

# dd if=/tmp/205971590-205971595.dat of=/dev/sdg repartition check

# badblocks-s-v / dev/mapper/VolGroup-lv_home > badsectors.txt if there is no bad path, the repair has been completed. If there is any bad path, you can try to repeat the above method.

5. Shield the bad part

Badsectors.txt files and device files are also required to execute e2fsck (for ext2/ext3/ext4 file systems) or fsck commands.

Note: the-l option tells the command to add a list of bad blocks to the sector numbers listed in the specified file badsectors.txt.

-for for ext2/ext3/ext4 file system-# e2fsck-l badsectors.txt / dev/sdb1 e2fsck 1.42.9 (28-Dec-2013) / dev/sdb1: Updating bad block inode.Pass 1: Checking inodes, blocks And sizesPass 2: Checking directory structurePass 3: Checking directory connectivityPass 4: Checking reference countsPass 5: Checking group summary information/dev/sdb1: * FILE SYSTEM WAS MODIFIED * / dev/sdb1: 12 sudo fsck 1310720 files (8.3% non-contiguous), 128782 blocks- for other file systems-$sudo fsck-l badsectors.txt / dev/sda10

6. Fsck tool

Parameters:

Filesys: disk device name (eg./dev/sda1), mount point (eg. / or / usr)

-t: given the type of the file system, this parameter is not required if it is already defined in / etc/fstab or supported by kernel itself

-s: execute fsck instructions one by one in order to check

-A: check all listed partitions (partition) in / etc/fstab

-C: displays the full progress of the inspection

-d: print out the debug result of e2fsck

-p: when there is a-A condition, multiple fsck checks are performed at the same time.

-R: omit / do not check when there is also a-A condition

-V: detailed display mode

-a: if there is an error in the check, it will be repaired automatically

-r: if there is an error in the check, it is up to the user to answer whether to fix it.

-y: the option specifies that each file is automatically entered into yes, and when you are not sure which files are abnormal, you can perform a # fsck-y full check fix.

Use the Smartmontools tool to scan for bad channels on Linux

This method is more reliable and efficient for modern disks (ATA/SATA and SCSI/SAS hard drives and solid state drives) with S.M.A.R.T (self-monitoring analysis and reporting technology Self-Monitoring, Analysis and Reporting Technology) system. The S.M.A.R.T system can help detect, report, and possibly record their health, so you can identify any possible hardware failures.

You can install smartmontools using the following command:

-on Debian/Ubuntu-based systems-

$sudo apt-get install smartmontools

-on RHEL/CentOS-based systems-

$sudo yum install smartmontools

After the installation is complete, use smartctl to control the disk integrated S.M.A.R.T system. You can check its manual or help like this:

$man smartctl$ smartctl-h

Then execute the smartctrl command and specify your device as a parameter in the command, which includes the parameter-H or-- health to display the results of the SMART overall health self-assessment test.

$sudo smartctl-H / dev/sda10

Check the health of Linux hard drives

The above results indicate that your hard drive is healthy and hardware failures are unlikely to occur in the near future.

To get an overview of disk information, use the-an or-- all options to display all SMART information about the disk, and-x or-- xall to display all SMART information about the disk as well as non-SMART information.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.