How to deal with RAID 10 faults in ubuntu server 07/01 Update SLTechnology News&Howtos

How to deal with RAID 10 faults in ubuntu server

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is about how to deal with RAID 10 failures in ubuntu server. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

◆ fault handling

Let's simulate what happens when a RAID fails.

◆ removes devices from RAID

The device you are using is not allowed to be removed. To remove it, you must first mark it as fail. If a device in your RAID fails and you want to remove it, you also need to mark it as fail first.

1. Remove a single RAID physical volume

Suppose there is an exception in the sda1 partition and we need to remove it.

Let's remove the physical volume sda1 from RAID:

$sudo mdadm / dev/md0-- fail / dev/sda1-- remove / dev/sda1

Mdadm: set / dev/sda1 faulty in / dev/md0

Mdadm: hot removed / dev/sda1

If you plan to empty the removed device for other purposes, you must clear the super block, or the system will assume that the device still belongs to a RAID array:

$sudo mdadm-- zero-superblock / dev/sda1

two。 Remove the entire hard drive

To remove the entire hard drive from the RAID, you need to first remove all RAID physical volumes from the hard drive.

For example, to remove the entire * hard disk sda, you need to mark sda1, sda2, and sda3 as fail, and then remove them all:

Mdadm: set / dev/sda3 faulty in / dev/md2

Mdadm: hot removed / dev/sda3

Now, if you are on a hot-swappable server, you can unplug the hard drive.

◆ adds an existing RAID physical volume

To add a device to RAID, use the-- add directive.

If the RAID physical volume has been created on the device to be added, such as the sda1, sda2, and sda3 devices we just removed, the process of adding is simple:

$sudo mdadm / dev/md0-- add / dev/sda1

Mdadm: re-added / dev/sda1

$sudo mdadm / dev/md1-- add / dev/sda2

Mdadm: re-added / dev/sda1

$sudo mdadm / dev/md2-- add / dev/sda3

Mdadm: re-added / dev/sda1

◆ replaces a brand new hard drive

1. Remove the bad hard drive

Assuming that the entire sda is no longer available, we need to replace it with a brand new hard drive. First, remove all partitions of sda from the RAID:

$sudo mdadm / dev/md0-- fail / dev/sda1-- remove / dev/sda1

$sudo mdadm / dev/md1-- fail / dev/sda2-- remove / dev/sda2

$sudo mdadm / dev/md2-- fail / dev/sda3-- remove / dev/sda3

After removing it, check the RAID status to see if it has actually been removed:

We can see that the sda device is no longer in the RAID, and the RAID10 state has changed to three Up devices: [_ UUU].

Now, if you are on a hot-swappable server, you can unplug the hard drive.

two。 Insert the hard drive

Although * blocks of hard drives have been removed from the RAID, the system can still boot. This is because the sdb has become a * hard disk, and now the hd0 in the grub configuration is actually sdb. So, even if the hard drive is broken, the system can still boot without changing the grub configuration.

If you are experimenting on a real server, and the server supports hot swapping of the hard drive, you don't need to restart the server, just unplug the "bad" hard drive and replace it with a new one.

Since we are experimenting in VMware and do not support hot swapping, you must turn off the computer to add a new hard disk:

$sudo halt

After shutting down, in VMware, first add a new hard drive, and then delete the original * * block "bad" hard drive. Please note that if you delete the old hard drive and then add the new hard drive, VMware will add the new hard disk as SCSI0:0, that is, sda; for Linux, because this is a brand new hard drive and there is no grub on it, so if it is a sda, it will cause the system not to boot.

After the new hard drive is added, turn it on.

Yes, in the absence of * hard drives and only 3 good hard drives left, the system does boot normally. Now, you realize the superiority of our plan.

After the system boots, let's take a look at the existing hard drives:

$sudo fdisk-l

You should see that the original three hard drives have all moved forward, the original sdb has become the present sda, and the newly added hard drive has become sdd. (if you are on a real server without restarting the server and using a hot plug to add a new hard drive, the new hard drive is still sda.)

3. New hard disk partition

For convenience, we directly copy the partition information of the existing hard disk to the new hard disk:

$sudo sfdisk-d / dev/sda | sudo sfdisk / dev/sdd

Now, our new hard drive has been divided into zones and can join RAID.

4. Add a new partition to the RAID

Before adding a new partition to RAID, let's take a look at the details of md1:

As you can see, there are only 3 devices working in md1, and the original * devices have been removed.

OK, now let's add sdd2 to md1:

$sudo mdadm / dev/md1-- add / dev/sdd2

After the command is executed, mdadm rebuilds the md1, and you can check the creation progress and status:

$sudo mdadm-- detail / dev/md1

[...]

Rebuild Status: 7% complete

[...]

Number Major Minor RaidDevice State

4 8 50 0 spare rebuilding / dev/sdd2

[...]

Depending on the size of the partition, the length of the rebuild process will vary. When the reconstruction is over, the state should be as follows:

Then, we rebuild md0 and md2 as well:

$sudo mdadm / dev/md0-- add / dev/sdd1

$sudo mdadm / dev/md2-- add / dev/sdd3

5. Set up grub

*. You also need to set grub, otherwise the new hard disk cannot start the system:

$sudo grub

Grub > root (hd3,0)

Grub > setup (hd3)

Grub > quit

If you are on a real server and the new hard drive is sda, you should install grub on hd0.

OK, now the new hard drive has been added to the RAID to replace the bad hard drive.

◆ add a spare hard disk

If we have a spare device in our RAID array, then if a device fails, the system will automatically replace the spare device, so we don't need to replace it manually.

We can prepare a spare hard drive for RAID as soon as we install Ubuntu, and use the-x or-- spare-devices= option to add spare physical volumes to RAID when creating RAID devices.

Of course, we can add a spare hard drive afterwards; we just can't use the-- spare-devices= option, but use-- add instead.

Before adding, let's check to see if there are any spare devices in RAID:

$sudo mdadm-- detail / dev/md1 | grep Spare

Spare Devices: 0

It can be seen that there are no spare devices in the current RAID array.

◆ inserts a new hard drive

Now, let's add a new hard drive to the server by following the steps described in the previous section.

After the system boots, let's take a look at the existing hard drives:

$sudo fdisk-l

You should see a sde device, which is our new hard drive.

◆ New hard disk Partition

In order to simplify the operation, we will directly copy the partition information of the existing hard disk to the new hard disk:

$sudo sfdisk-d / dev/sda | sudo sfdisk / dev/sde

OK, now that our new hard drive is divided into zones, we can join RAID.

New partition of ◆ joins RAID

Let's add the three partitions on the new hard disk to the three RAID arrays md0, md1 and md2 respectively:

$sudo mdadm / dev/md0-- add / dev/sde1

$sudo mdadm / dev/md1-- add / dev/sde2

$sudo mdadm / dev/md2-- add / dev/sde3

Now check to see if there are any spare devices in md0:

As you can see, the number of total equipment, the number of devices at work, and the number of standby devices have all changed.

Similarly, you can also check the details of md1 and md2, both of which should have a backup device.

◆ sets grub

We need to set up the grub of the spare hard drive in advance, just in case:

$sudo grub

Grub > root (hd4,0)

Grub > setup (hd4)

Grub > quit

◆ fault simulation

Now, let's assume that sda1 fails, and we mark it as fail:

$sudo mdadm / dev/md0-- fail / dev/sda1

Mdadm: set / dev/sda1 faulty in / dev/md0

Now, let's see if the standby device has been automatically enabled:

As you can see, the standby device sde1 has indeed automatically entered the working state, while the sda1 is marked as "failed".

When the sda1 fault is fixed, we can re-add it to the RAID and make it a backup device. Because sda1 has been marked as fail, we must first remove it and rejoin it:

$sudo mdadm / dev/md0-- remove / dev/sda1

$sudo mdadm / dev/md0-- add / dev/sda1

Spatial expansion of ◆ RAID10

Assuming that the system does not have enough hard disk space, you need to add a new hard disk.

Unfortunately, currently, mdadm only supports grow operations of RAID1, RAID5 and RAID6, that is, we cannot directly expand storage space for RAID10.

If we have to add new hard drives and space to the existing RAID10 array, we have to follow these steps:

(1) backup data

(2) create a new RAID array

(3) restore data.

That is, if you want to expand the storage space of RAID10, you must rebuild it.

However, we can combine with LVM, adopt the scheme of RAID+LVM, and build LVM on top of RAID, so that we can expand the storage space at will. This is what we will introduce in the next chapter.

Thank you for reading! This is the end of the article on "how to deal with RAID 10 faults in ubuntu server". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.