What is the Linux UBI subsystem for Flash? 07/06 Update SLTechnology News&Howtos

What is the Linux UBI subsystem for Flash?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces what the Linux UBI subsystem for Flash is like, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Text

Introduction to UBI

The full name of UBI is Unsorted Block Images, and the picture above shows the hierarchical structure of UBI in the system, and at the bottom is the flash layer (including flash controller, each flash driver code, spi-mem layer, etc.); the MTD layer is an abstraction of the flash layer, a flash may be divided into different partitions, and each partition corresponds to a MTD device The UBI layer is based on a higher layer above the MTD layer, and the UBI layer abstracts logical erase blocks, and each logical erase block has a physical erase block corresponding to the previous one. With this mapping, we can add some software algorithms to achieve erase and write balance, thereby improving the service life of flash; and then it is based on UBI layer implementation and various file systems, such as UBIFS.

Content stored by flash

First of all, several concepts are introduced:

PEB:physical eraseblocks is an erase block on the corresponding flash.

Concepts on LEB:logical eraseblocks software

Volume: volum

For example, the above figure shows the data organization structure in flash (or in a partition of flash):

The management of flash in the ubi layer is based on erasure blocks. LEB corresponds to the concept of software, and PEB corresponds to a real erase block on flash, and each LEB corresponds to a PEB.

Looking up, multiple LEB can form a volume, that is to say, LEB can be divided into different volumes according to different functions; valume-layout is a volume used internally by ubi to store the information of each volume divided on the MTD device, which contains two LEB, which store the same content and backup each other.

The content of each PEB consists of three parts: ech (erase counter header), vidh (volume identifier header), and data. The specific meaning will be described below.

Code implementation

Linux's code implementation of the UBI layer can be summarized into three aspects:

First of all, the data is stored in flash, so the relevant information in flash needs to be read into memory, and bad blocks in flash can also be detected.

After the data is read into memory, it needs to be organized according to the internal logical relationship (for example, the PEB being used is managed on the red-black tree, and the free PEB is also managed on the red-black tree)

Once you have these data relationships in memory, you can operate on them (such as read and write operations, volume add, delete, expand, etc., erase and write equalization operations)

Read flash data into memory

The UBI initialization time code call process is shown above. Eventually, the scan_all () function is called, and the scan_all () function traverses the MTD device.

For each PEB in, read out ech and vidh, which are defined as follows.

The definition of ech is as above, where:

Ec: indicates the number of times the PEB has been erased. With the help of this field, we can find the PEB with the least number of erasures, thus achieving the purpose of erasure balance.

Vid_hdr_offset: indicates the offset position of the vidh in this PEB

Data_offset: indicates the offset position of the actual data in this PEB

The definition of vidh is as above, where:

Vol_id: indicates which volume the PEB belongs to

Lmun: indicates the number of LEB in volume. This field forms a mapping relationship with the number of PEB in MTD devices. By traversing each PEB of MTD devices, you can know the situation of each PEB, either used, idle, or damaged. This information will be temporarily recorded in the struct ubi_attach_info structure. For details in the traversal process, please refer to the scan_all () function.

Organizational data structure

After traversing the PEB, the flash information is saved in the temporary structure struct ubi_attach_info, and then the temporary information in the struct ubi_attach_info is saved in the global structure struct ubi_device * ubi_devices, as follows:

It is divided into three steps, namely, the initialization of volume, the initialization of wear-leveling subsystem, and the initialization of eba (Eraseblock Association) subsystem.

Volume & EBA subsystem initialization

Earlier, it was introduced that volume-layout is a volume used internally by UBI, which contains two LEB (backup each other). The data content in the corresponding PEB is shown above. The data (gray) part is a struct ubi_vtbl_record structure array, which records the information of all the volumes of the current UBI device. The ubi_read_volume_table () function first traverses the temporary structure struct ubi_attach_info to find the PEB where the volumelayout is located. Then read out the struct ubi_vtbl_record structure array and save it in memory, that is, in the struct ubi_volume * volumes [] field of struct ubi_device. The initialized array structure is shown below, where struct ubi_volume * volumes [] is a pointer array, and every element in the array is a struct ubi_volume structure (see the ubi_read_volume_table () function for details).

In the struct ubi_volume structure, there is an important field struct ubi_eba_table * eba_tbl, which records the mapping relationship between all LEB and PEB in the current volume, where struct ubi_eba_entry * entries is an array structure, and each element corresponds to a struct ubi_eba_table structure, struct ubi_eba_entry * entries number.

The subscript of the group corresponds to the number of the LEB, and the contents of the array elements correspond to the number of the EB, thus associating the LEB with the PEB (see the ubi_eba_init () function for details).

Wear-leveling subsystem initialization

In UBI, PEB is divided into four situations: in use, idle, need to be erased, and damaged, and the PEB of each state is managed in different red-black trees. In the ubi_eba_init () function, an array of struct ubi_wl_entry pointers is assigned and stored in the sruct ubi_wl_entry * * lookuptbl field. The array is subscript to the number of PEB, and the contents of the array record the number of PEB writes.

Number and numbering information, each PEB has such a structure corresponding to it as shown in the following figure.

In addition, each PEB is also managed by different red-black trees according to the state. The picture above shows the red-black trees in three states: used, free, and scrub, in which the red-black tree is arranged in the order of the number of erasure, and the smallest number of erasure is arranged on the leftmost. If the number of erasure is the same, then compare the number of PEB, the smaller one is on the left side of the tree, and the corresponding value is an element in the struct ubi_wl_entry pointer array.

After calling the ubi_eba_init () function, the wear-leveling subsystem is initialized, and the array relationship in the figure above is formed in memory.

UBI layer operation

After the previous initialization, the structural relationship of each data has been saved in memory, so the operation of the UBI layer is actually the operation of the data in memory.

From the perspective of user space, after initialization, UBI will correspond to three types of character devices: / dev/ubi_ctrl, / dev/ubix (x = 0,1,2...), / dev/ubix_y (x = 0,1,2..., y = 0,1,2). Their corresponding operation functions are as follows.

Ubi_vol_cdev_operations: it operates on a certain volume (/ dev/ubi1_0, etc.). From the perspective of volume, you can only see the PEB contained in it, so its operation also revolves around PEB.

Ubi_cdev_operations: operates on UBI devices (/ deb/ubi0, etc.). Different volume can be seen from the perspective of UBI devices, so you can create, delete, expand, and other operations on volume.

Ubi_ctrl_cdev_operations: is an operation for the UBI layer (/ dev/ubi_ctrl), from which you can see the UBI device, so you can create and delete the UBI device.

For instance

Requirements: if we want to expand the volume of / dev/ubi1_0, how do we apply it?

User space passes two parameters of volume_id,size to kernel space

In kernel space, we find the handler of volume in the struct ubi_volume * volumes [] array according to volume_id.

Because you need to expand the capacity (to allocate more LEB), you need to reallocate the struct ubi_eba_table * eba_tbl array and copy the data from the old array to the new array.

For the new LEB, we need to apply from the free tree, establish the mapping from LEB to PEB and save it to the struct ubi_eba_table * eba_tbl array. We also need to update the ech and vidh in PEB to indicate which volume the PEB belongs to.

The above series of operations is my own idea, not the kernel implementation code (the specific implementation can be the parameter ubi_cdev_ioctl () function). What I want to say here is that after the initialization of UBI, there is already a relationship between each volume and each LEB/PEB in memory, so we can theoretically complete the operation of UBI, only the code implementation. Program = algorithm + array structure, here the array structure already has, and the algorithm is the various operations of the UBI layer, the code here is actually everyone can achieve, but there are good and bad, fortunately kernel has helped us to achieve, we can refer to and learn. In fact, articles written by others can only provide a general idea, and the real details can only be obtained in the source code.

Erase-write equilibrium

The erase and write blocks of flash have a life limit, if you frequently erase and write a certain PEB of flash, the PEB will soon be damaged, and the purpose of erasure equalization is to evenly distribute the erase operation to the whole flash, which can improve the service life of flash. So how to distribute the erasure operation evenly to the whole flash? it is a bit difficult to achieve this condition, so we step back and modify the condition that the difference between the maximum number of erasures and the minimum number of PEB is less than a certain value.

For example, flash contains 20 PEB, in which the number indicates the number of times the PEB is erased. The maximum difference between the agreed number of erasure times is 15. Now the minimum and maximum number of erasure times of PEB in flash are 10 and 39, respectively. Because the threshold is exceeded, we need to think of some ways to increase the chance of PEB with 10 erasure times and reduce the chance of PEB erasure times of 39 times. As a result, the erasing times of the whole flash tend to be average. The specific implementation will be described later.

Time to erase and write

Linux kernel invokes erase and write equalization in the following two locations:

When the initialization of the wear-leveling subsystem is completed, it is checked to see if it needs to erase and write equality. this is an initial state and an opportunity to check.

When you want to erase a certain PEB, the number of erasure will increase, so it is possible to meet the requirement of erasure equalization, and this is also a time to check.

Erasure condition

In addition to the above call timing, there are some other restrictions on erase-write equalization. The following figure shows the flow chart of erase-write equalization:

When there are nodes on the scrub red-black tree, erase and write equalization must be done. When traversing each PEB of flash, if it is found that there is a bit flipping in the data read from the flash, the scrub flag will be added and maintained on the scrub red-black tree, indicating that the PEB needs to be erased; when erasing and writing equilibrium, first take out the leftmost node E1 of the scrub tree, and then find a suitable node e2 from the free tree, and then read the data of E1 corresponding to PEB. If there is a problem with the read data, the erasure will end. If there is no problem, the E1 data will be copy to the E2 position, and the E1 data will be erased to complete the erase and write equalization operation.

When there is no node on the scrub tree, the leftmost node E1 will be taken out from the used tree, and a suitable node e2 will be found from the free tree, and then check whether the difference between the PEB erasure times of e2 and E1 is greater than the threshold value. If so, copy the E1 data to the E1 position and erase the E1 data to complete the erasure. The reason for this is that the nodes in the used tree have been initialized (erase the whole first, then write ech and vidh, and then write data without erasing), so there is no erasure operation. Nodes on the free tree need to be erased once before being used, so put the PEB with large erasure times on the used tree to reduce the chance of being erased, and put the nodes with small erasure times on the free tree to increase the chance of being erased. In this way, the goal of erasing and writing equilibrium is achieved.

In addition, select a suitable node on the free tree. What is a suitable and node? The easiest way is to take an upper node (the node with the largest number of erasures) from the far right of the free tree and compare it with the leftmost node removed from the used tree to see if the difference exceeds the threshold. But the actual situation may be more complicated, the following code line 29, is the method of selecting nodes in the free tree in kernel, which limits the maximum number of erasure times to the leftmost node of the free tree + WL_FREE_MAX_DIFF, see the above note that in some cases will continue to erase one or more PEB situation, so made such a restriction. I didn't think about what was going on.

On how the Linux UBI subsystem for Flash is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.