What is the use of RAID disk array 04/17 Update SLTechnology News&Howtos

What is the use of RAID disk array

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

Editor to share with you what is the use of RAID disk array, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

In the stand-alone era, using a single disk for data storage and reading and writing, due to the time consumption of addressing and reading and writing, the performance of IWeiO is very low, and the storage capacity will be limited. In addition, a single disk is extremely prone to physical failure, which often leads to data loss. So people are wondering if there is a way to combine multiple independent disks together to form a technical solution to improve the reliability of data and the performance of RAID disk arrays.

Brief introduction

The full name of RAID is redundant Array of Independent disks (Rdeundant Array of Independent Disks). The basic idea is to combine several relatively cheap hard disks into a hard disk array group so that the performance can reach or even exceed that of an expensive hard disk with huge capacity. RAID is usually used on server computers, using exactly the same hard disk to form a logical sector, so the operating system only treats it as a hard disk.

RAID is divided into different levels, each of which makes different tradeoffs in data reliability and reading and writing performance. In practical application, we can choose different RAID schemes according to our actual needs.

Standard RAIDRAID 0

RAID0, known as Striping storage, stores data in segments on each disk, and both reads and writes can be processed in parallel. Therefore, its read and write rate is N times that of a single disk (N is the number of disks that make up the RAID0), but there is no data redundancy, and the corruption of a single disk will lead to irreparable data.

Most striping implementations allow managers to define how data is segmented and written to disk by adjusting two key parameters that have an important impact on the performance of RAID0. STRIPE WIDTHstripe width refers to the number of stripe that can be written in parallel, which is equal to the number of disks in the disk array. STRIPE SIZE, also known as block size (chunk size,stripe length,granularity), refers to the size of the data blocks written to each disk. RAID segmented by blocks usually allows the choice of block sizes from 2KB to 512KB, and higher, but must be an exponential multiple of 2. A byte-segmented (such as RAID3) stripe size is generally 1 or 512 bytes and cannot be adjusted by the user.

The impact of stripe size on performance is difficult to measure simply, and it is best to adjust and observe its impact according to your own needs in practical applications. In general, by reducing stripe size, files are divided into smaller chunks and data transfer is faster, but more disks are needed to save, and positioning performance is added, and vice versa. It should be said that there is no theoretically optimal value. In many cases, the strategy of the disk controller should also be considered. for example, some disk controllers will wait until a certain amount of data before writing to the disk.

RAID 1

Mirrored storage (mirroring) with no data parity. Data is equally written to two or more disks, so it is conceivable that the write speed will be slower, but the read speed will be faster. The read speed can be close to the sum of all disk throughput, and the write speed is limited by the slowest disk.

RAID1 also has the lowest disk utilization. If you use two disks of different sizes to build a RAID1, you can use the smaller one, and the extra part of the larger disk can be used for other purposes and will not be wasted.

RAID 24

The improved version of RAID0 adds hamming code (Hanmming Code) error check.

Hamming code can detect at most two simultaneous bit errors and correct single bit errors. There is an inequality relationship between the number of bits of hamming code and the number of data, that is, 2 ^ P ≥ P + D + 1P represents the number of hamming codes, and D represents the number of data bits. For example, 4-bit data requires 3-bit hamming codes, 7-bit data requires 4-bit hamming codes, and 64-bit data requires 7-bit hamming codes. RAID2 divides data writing according to 1bit, and PvD represents the number of data disks and check disks. So the larger the data bit width, the smaller the proportion of disks used for verification. Because hamming codes can correct errors in a single bit, hamming codes can correct data when a single disk is damaged.

RAID 2 needs a whole set of disk linkage for each read and write, so in order to maximize its performance, it is best to ensure that each disk spindle is synchronized, so that the logical number of the sector where each disk head is located at the same time is the same, coexisting and fetching to achieve the best performance. If it cannot be synchronized, waiting will occur and the speed will be affected.

Compared with RAID0, the transmission rate of RAID2 is better. Because the general stripe size of RAID0 is too large compared to the 1bit of RAID2, it is not guaranteed to be multi-disk parallel every time. However, RAID2 can guarantee multi-disk parallelism every time. In order to take advantage of this advantage, disk seek time must be reduced (seek time is several orders of magnitude larger than data transfer time), so RAID2 is suitable for continuous IO and large blocks of IO (such as video streaming service).

RAID 3

Similar to RAID2, data striping (stripe) is stored on different hard disks, with data in bytes, but RAID3 uses a single disk to store simple parity information, so the final number of disks is Number1. When one of the 1 hard drives fails, the original data can also be recovered from the data in the other N hard drives. When a new hard disk is replaced, the system can restore the complete check fault tolerance information.

Because in a hard disk array, the probability of more than one hard disk failure rate at the same time is very small, so in general, using RAID3, security can be guaranteed. RAID 3 will distribute the data writing operations to multiple disks, and no matter which data disk is written to, the relevant information in the check disk needs to be rewritten at the same time. Therefore, for those applications that often need to perform a large number of write operations, the load of the check disk will be too heavy to meet the running speed of the program, resulting in a decline in the performance of the whole RAID system. For this reason, RAID 3 is more suitable for applications with fewer writes and more reads, such as databases and WEB servers.

RAID 4

Similar to RAID3, but RAID4 is accessed by block (sector). It does not need to be like RAID3, even if each small Icano operation involves the whole group, it only needs to involve two hard disks (one data disk and one check disk) in the group, thus improving the speed of a small amount of data.

RAID 5

Parity check (XOR), where data is stored in block segments and stripes. The check information is cross-stored on all data disks. RAID5 stores the data and the corresponding parity information on the disks that make up the RAID5, and the parity information and the corresponding data are stored on different disks, in which any Nmuri disk stores complete data, that is to say, there is a space equivalent to the capacity of a disk to store parity information. Therefore, when a disk of RAID5 is damaged, it will not affect the integrity of the data, thus ensuring the security of the data. When the damaged disk is replaced, RAID will automatically use the remaining parity information to reconstruct the data on the disk to maintain the high reliability of RAID5.

RAID 5 can be understood as a compromise between RAID 0 and RAID 1. RAID 5 can provide data security for the system, but the degree of protection is lower than that of mirrors, and the utilization of disk space is higher than mirrors. RAID 5 has a data read speed similar to that of RAID 0, but because of the addition of parity information, the speed of writing data is slightly slower than that of writing to a single hard disk.

RAID 6

Similar to RAID5, but added a second independent parity information block, two independent parity systems use different algorithms, the reliability of the data is very high, even if two disks fail at the same time will not affect the use of data. However, RAID 6 needs to allocate more disk space for parity information, which has a greater "write loss" than RAID 5, so the "write performance" is very poor.

As you can see from the figure, there is not only a peer data XOR check area on each hard disk, but also a XOR check area for each data block. Of course, the check data of the current disk block cannot exist in the current disk but is interlaced. From a mathematical point of view, RAID 5 uses an equation to solve an unknown variable, while RAID 6 can form a set of equations through two independent linear equations, thus recovering two unknown data.

With the growth of hard disk capacity, RAID6 has become more and more important. Data loss is more likely to occur on TB-rated hard drives, and the data reconstruction process (such as RAID5, which allows only one hard drive to be damaged) is getting longer and longer, even for weeks, which is totally unacceptable. RAID6 allows two hard drives to fail at the same time, so it is gradually favored by people.

With the advent of CD,DVD and Blu-ray discs, erasure code technology has appeared in storage media, which can still be played even if there are scratches on the surface of the media. Most common erasure code algorithms have evolved into Reed-Solomon codes developed by the Lincoln Laboratory of the Massachusetts Institute of Technology in the 1960s. In practice, most RAID6 implementations use standard RAID5 parity bits and Reed-Solomon codes. The use of pure erasure code algorithm enables the RAID6 array to fail more than two hard disks, and the protection is stronger. Some implementation methods provide multiple levels of protection, and even allow users (or storage administrators) to specify protection levels.

Hybrid RAIDRAID 01

As the name implies, it is a combination of RAID0 and RAID1. First do the stripe (0), and then do the mirror (1).

RAID 10

Same as above, but first mirror (1), then stripe (0)

RAID01 and RAID10 are very similar, and there is no difference in read and write performance. But RAID10 is better than RAID01 in terms of security. As shown in the figure, assuming that the DISK0 is damaged, in the RAID10, in the remaining 3 disks, the entire RAID will fail only if the DISK1 fails. However, in RAID01, after the DISK0 is damaged, the left stripe cannot be read, and in the remaining 3 flash disks, any damage to either DISK2 or DISK3 will cause RAID failure.

RAID10 and RAID5 are also often used to compare two schemes, and both of them have been widely used in production practice. RAID10 is more secure, but space utilization is low. As for read and write performance, it has a lot to do with cache, so it is best to compare and choose according to the actual situation.

Non-standard RAIDDRFS

DRFS, or DistributedRaidFileSystem, is a technology that attempts to combine RAID with Hadoop's DFS. In practice, HDFS usually needs to set replication factor to 3 to ensure data integrity, but if you use RAID's stripe and partity (parity) technology, the data is divided into multiple blocks, and the check message (XOR or erasure code) of each block is stored. With these measures, the number of copies of blocks can be reduced and the same data reliability can be ensured, and a considerable amount of storage space can be saved.

DRFS consists of the following components:

DRFS client: provides an interface for the application to access DRFS, and repairs when it is found that the file read is corrupted. The whole operation is transparent to the application RaidNode: create and maintain the daemonBlockFixer of the verification file: periodically check the file, recalculate the checksum, and repair the file. RaidShell: similar to hadoop shell.ErasureCode: that is, the algorithm used by DRFS to generate the check code, which can be XOR or Reed-Solomon algorithm. XOR can create only one parity byte, while Reed-Solomon can create countless bits (the more bits, the more data can be recovered). If you use Reed-Solomon,replication, it can even be reduced to 1, which reduces the parallelism of data reading and writing (can only be read and written from a stand-alone machine). Realize

Software realization

Nowadays, most operating systems provide the software implementation of RAID, mainly from the following aspects:

RAID is created by software on multiple devices, such as the mdadm tool on linux. For specific usage, you can see the examples in the reference link. LVM or Veritas, a virtual volume management tool. File system implementation: btrfs,ZFS,GPFS. These files can directly manage data on multiple devices, showing a function similar to that of RAID at all levels. RAID system (RAID-F) that provides data verification on top of existing file systems

Firmware / driver implementation

The software implementation is always compatible with the startup process of the system, and the hardware implementation (RAID controller) is always too expensive and proprietary technology of the manufacturer, so there is a mixed implementation: when the system starts, the firmware (firmware) implements the RAID, the system starts almost, and the driver manages the RAID. Of course, this requires the operating system to support this driver.

These are all the contents of the article "what is the use of RAID disk arrays?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.