What is the meaning of hard disk and RAID in the core technology of Linux operating system storage subsystem 07/02 Update SLTechnology News&Howtos

What is the meaning of hard disk and RAID in the core technology of Linux operating system storage subsystem

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail the meaning of hard disk and RAID in the core technology of the storage subsystem of the Linux operating system. The content of the article is of high quality, so the editor shares it for you to do a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

The storage subsystem of the Linux operating system should be the most complex subsystem in Linux. In fact, many subsystems think of themselves as the most complex subsystems, such as memory subsystems and network subsystems. In any case, the storage subsystem is more complex in Linux. Today we will introduce the hard disk and RAID in Linux's storage subsystem, and then write an article about LVM and file system.

Hard disk

In the Linux storage subsystem, the lowest level is the hard disk. The hard disk here does not refer to the hard disk hardware we see, but refers to the hard disk device, or block device, that we see inside the Linux. If we execute the following ls command in the / dev directory, we can see a lot of devices. Among these devices, the hard disk that starts with sd is based on the SCSI protocol.

Figure 1 Block device in Linux

Whether it's a SAS-based, iSCSI-based, or FC-based disk device, it looks something like this. Similar to dm-X is the Device Map block device, that is, the device managed by LVM, which is a logical device.

There are many kinds of block devices in Linux operating system, including local disk devices, SAN devices and network-based block devices. The block device appears as another file name in the virtual machine, such as xvdX in the Xen virtual machine.

Although the names vary widely, the implementation in the Linux operating system kernel is very simple. Any disk block device in the kernel is done by calling the add_disk function. The book "Linux device driver" gives a detailed introduction to block devices, and you can implement your own block devices with very simple code.

Figure 2 the simplest block device driver

There are two functions, alloc_disk and add_disk. The former function allocates the structure of a generic block, and the latter adds the block device to the kernel, that is, generating a "file" in the / dev directory. Taking the above code as an example, the following block devices are generated after execution.

Brw-rw---- 1 root disk 251, 0 Jun 16 09:13 / dev/sbulla

Here we customize a device name sbulla. In fact, the SCSI device we see is also defined in this way, except that the name is defined by sd characters.

Taking the above code as an example, the important thing in the block device is that a queue handler (sbull_full_request) is initialized. All requests to access the block device from the upper layer are forwarded to the handler for processing.

All block devices initialize the queue and provide a request handler. Request handling functions are slightly different for different block devices. For example, for common SCSI block devices, the initialization process of the handler function is as follows:

Q = _ _ scsi_alloc_queue (sdev- > host, scsi_request_fn)

The code for initializing the queue of the nbd (network block device, which maps the file of the server to the block device of the client through the network) device is as follows:

Disk- > queue = blk_init_queue (do_nbd_request, & nbd_lock)

There are many similar examples, which are not covered one by one in this article. What we need to understand here is that the core problem is to register the callback function that handles the request and to create a block device under the / dev directory through add_disk.

On the other hand, for any type of block device, whether it's a local hard disk, a networked NBD and iSCSI, or a FC device, it ends up with a file in the / dev directory, which is actually a block device. We can access the block device by reading and writing the file.

RAID

There is no problem with using a single hard drive as an ordinary user, but there is a great risk of using a single hard drive as an enterprise application. At this time, because the hard disk may be damaged at any time, we need a mechanism to ensure that even if the hard disk failure occurs, the data will not be lost, and the business can still work normally.

RAID is the technology to solve the above problems. The full name of RAID is cheap redundant disk Array (Redundant Array of Inexpensive Disks). Literally, it can be seen that its basic principle is to form a set of disks through cheap disks. RAID can not only solve the problem of data reliability through redundancy, but also improve performance. The main principle is to split the request into multiple physical hard drives to execute, so the performance is naturally faster than one hard disk.

At the Linux operating system level, physical disks are abstracted into logical disks through software. Take RAID1 (two disks store the same data and do not lose data in the event of a disk failure) as an example, create a virtual block device through the software in the Linux kernel, and the corresponding physical devices and related parameters are recorded in the block device.

Fig. 3 schematic diagram of RAID1

Therefore, from the user level, it is an ordinary disk device, while at the bottom there are two independent physical hard drives. When the user writes data to the logical disk, the software calculates the parameters and redirects the data to the underlying physical device. In this way, you can ensure that even if a physical disk is damaged, the user's data remains intact.

In addition to the RAID1 mentioned above, there are many RAID types. Different RAID types implement different functions. For example, the striping of RAID0 is mainly to improve performance, while RAID1 is to achieve data redundancy to prevent data loss caused by disk failures. Because the above RAID can only solve one side of the problem, some people say that the combination of the two leads to the emergence of RAID10 and RAID01, which can not only ensure the reliability of data, but also improve performance.

Because RAID1 is a single piece of data written to two devices, there is only 50% valid data. In order to improve the effective data rate, types such as RAID5 and RAID6 are invented. Among them, RAID5 ensures the reliability of the data by adding a check data. Take the RAID5 of 5 disks as an example, in which the effective number accounts for 4 disks, and the effective data is 80%. But RAID5 has a problem, that is, only one disk in a group can be broken, and if more than 1 disk is damaged, it will lead to data loss. RAID6's algorithm is similar to RAID5 in that it can tolerate two disk failures.

At the implementation level, the RAID implementation of Linux is involved in both user mode and kernel state. Among them, the user mode mainly manages the RAID, while the kernel state cooperates with the user mode to manage RAID, on the other hand, it realizes the processing of IO, which is the core content of RAID.

Figure 4 Software architecture

For RAID based on SCSI physical disks, the entire software architecture in the Linux environment is shown in figure 4. Above the dotted line is the user mode software module, and below the dotted line is the kernel state software module. The core here is the RAID common layer, where the main creation of md devices, the device is a logical device, but also users can see the RAID device. Below it are specific RAID modules that implement different RAID levels (algorithms).

Further down is the general SCSI driver layer, which is the content of the SCSI disk driver layer in the figure. This layer is actually the upper driver of the SCSI system (the SCSI subsystem is divided into three layers). The RAID module can read and write physical disk data by calling the data access interface of this layer.

On the Linux operating system storage subsystem core technology of the hard disk and RAID what means to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.