In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "what is the principle of ssd storage". In daily operation, I believe many people have doubts about what the principle of ssd storage is. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "what is the principle of ssd storage?" Next, please follow the editor to study!
It is well known that SSD reads and writes much faster than hdd disks, and understanding how ssd works enables us to develop efficient storage solutions.
Linux related instructions
Fstrim-- fstab-- verbose # # Reclamation (discard) unused blocks on the corresponding disk blkdiscard / dev/nvme1n1 # # Recycle and erase (discard) signature of the entire SSD block device wipefs-a / dev/nvme1n1 # # erase (erase) filesystem
Fstrim
The fstrim command can actually be seen as manually issuing TRIM instructions to SSD disks. Use the-v parameter to directly view the size of the TRIM reclaimed erase space. Fstrim is the SSD partition for mounted file systems
Root@xxxx:~# fstrim-- helpUsage: fstrim [options] Discard unused blocks on a mounted filesystem.Options:-a,-- all trim all supported mounted filesystems-A,-- fstab trim all supported mounted filesystems from / etc/fstab-o,-- offset the offset in bytes to start discarding from-l,-- length the number of bytes to discard-m,-- minimum the minimum extent length to discard-v,-- verbose print number of discarded bytes-n -dry-run does everything, but trim-h,-- help display this help-V,-- version display versionFor more details see fstrim (8).
The following is the result returned after execution, with NVMe as the column
/ home: 32.5 GiB (313011310592 bytes) trimmed on / dev/mapper/gat204--vg-root/boot/efi: 102.2 MiB (607301632 bytes) trimmed on / dev/nvme1n1p1/boot: 732.5 MiB (825778176 bytes) trimmed on / dev/nvme1n1p2/: 60.7 GiB (65154805760 bytes) trimmed on / dev/mapper/gat204--vg-swap_1
The systemd services of ubuntu and debian can execute fstrim regularly, eliminating the need for handwritten crontab scripts.
Systemctl status fstrim.timer # # query service status systemctl enable fstrim.timer # # enable timing TRIM function
Blkdiscard
Blkdiscard is used to discard SSD device sectors. Unlike fstrim, this command is used directly on block devices and erases all sectors of the entire block device by default.
Root@xxxx:~# blkdiscard-helpUsage: blkdiscard [options] Discard the content of sectors on a device.Options:-o,-- offset offset in bytes to discard from-l,-- length length of bytes to discard from the offset-p,-- step size of the discard iterations within the offset-s,-- secure perform secure discard-z,-- zeroout zero-fill rather than discard-v,-- verbose print aligned length and offset-h -- help display this help-V,-- version display versionFor more details see blkdiscard (8) .root @ ECSab169d:~# man blkdiscard
No result is returned after the erase (discard) is successful
Root@xxxx:~# blkdiscard / dev/nvme1n1root@xxxx:~#
Wipefs
Wipefs is a program that comes with linux and is used to erase (erase) the signature of a file system without emptying any other data in the file system or device. By default, wipefs does not erase unless the nested partition table is on the entire disk device. To do this, you need the-force option.
Root@gat204:~# wipefs-- helpUsage: wipefs [options] Wipe signatures from a device.Options:-a,-- all wipe all magic strings (BE CAREFUL!)-b,-- backup create a signature backup in $HOME-f,-- force force erasure-I,-- noheadings don't print headings-J,-- json use JSON output format-n,-- no-act do everything except the actual write () call-o -- offset offset to erase, in bytes-O,-- output COLUMNS to display (see below)-p,-- parsable print out in parsable instead of printable format-Q,-- quiet suppress output messages-t,-- types limit the set of filesystem, RAIDs or partition tables-h,-- help display this help-V Version display versionAvailable output columns: UUID partition/filesystem UUID LABEL filesystem LABEL LENGTH magic string length TYPE superblok type OFFSET magic string offset USAGE type description DEVICE block device nameFor more details see wipefs (8).
Check whether SSD supports TRIM
# # you can determine that SSD supports TRIM based on the information under / sys/block. If discard_granularity is not 0, it is supported. # cat / sys/block/sda/queue/discard_granularity0# cat / sys/block/nvme0n1/queue/discard_granularity512
Storage components (flash memory particle category)
The storage of SSD is NAND-Flash flash memory particles, which can be divided into SLC, MLC and QLC. The flash memory particle can be roughly understood as a combination of capacitance and voltmeter. A capacitor can store one bit of data, a voltmeter enables the capacitor to distinguish between different voltages, and different voltages can store more bits of data.
SLC (Single-Level Cell): each Cell cell stores 1bit information, that is, there are only two kinds of voltage changes: 0 and 1, the structure is simple, and the voltage control is fast, which is characterized by long life and strong performance. The lifetime of Pamp E is between 10, 000 and 100000 times, but the disadvantage is low capacity and high cost.
MLC (Multi-Level Cell): each cell cell stores 2bit information, which requires more complex voltage control. There are four kinds of changes, which also means that write performance and reliability are reduced. The life span varies from 3000 to 5000 times according to different manufacturing processes.
TLC (Triple-Level Cell): each cell cell stores 3bit information, the voltage varies from 000 to 001, the capacity increases again by 1 amp 3, the cost is lower, but the architecture is more complex, the programming time of MLC E is long, the writing speed is slow, and the lifetime of Pamp E is reduced to 1000-3000 times, in some cases. The short life is only relative. Generally speaking, there is no problem for heavily tested TLC particles to be used for more than 5 years.
QLC (Quad-Level Cell): QLC or 4bit MLC, there are 16 kinds of voltage changes, but the capacity can be increased by 33%, that is, write performance, Pamp E life will be further reduced compared with TLC. Meguiar has done experiments on specific performance tests. In terms of read speed, both of them in SATA interface can reach 540MB, which performs poorly in writing speed, because the programming time of Pmax E is longer and the speed is slower than MLC and TLC. The continuous writing speed decreases from 520MB/s to 360MB/s, and the random performance decreases from 9500 IOPS to 5000 IOPS, resulting in a loss of nearly half.
Among these four categories, SLC has the best performance and high price; MLC has sufficient performance and moderate price is the mainstream of consumer-grade SSD applications; TLC has the lowest comprehensive performance and the cheapest price, but it can make up for and improve the performance of TLC flash memory through high-performance master control and master control algorithm; QLC appeared very early, cheap and large capacity.
Pram E and its SSD underlying storage structure
P refers to Program (programming) and E refers to Erease (erase). One complete erase of flash memory can be called one-time Pamp E, so the life of flash memory is measured by Pamp E. Unlike HDD mechanical hard drives, HDD data can be overwritten (Overwrite), while SSD must be erased (erase) before writing data. Generally, in the process of formatting file system steps or SSD factory configuration, SSD has been completely erased (erase), so the first time SSD writes data is directly programmed.
SSD has the concept of page and block when accessing data in a flash memory unit. SSD is divided into many block, while block is divided into many page.
NAND-Flash read and write process
Page is a single read-write unit of NAND-Flash, and its size is generally 4K or multiples of 4K bytes. Write operations can only be written to empty page, while Erase is in block. The number of erasures of blocks has a lifetime limit, and if the limit is exceeded, it will become a bad block.
Users' write operations to SSD can be divided into two types
1. Originally, there is no data on the SSD disk, so the data is written.
There is data on 2.SSD disk. Modify (including delete) the data.
The former only needs to write the data directly to the blank page, while the latter is operated in read-modify-write mode, reading the contents of the original page into the cache and updating them, and finally writing to other empty page, and the original page is set to an invalid page.
It is conceivable that continuous and repeated changes to the file will result in a large number of invalid pages, which requires a "garbage collection" (Garbage Collection-gc) mechanism to recycle these invalid pages, otherwise there is less and less space to write.
FTL and wear balance
The master of SSD executes the wear equalization (Wear-Leveling) strategy, so that the erasure times of each block of the SSD disk are evenly distributed to each block. Just like memory MMU, SSD internally uses the Flash conversion layer (FTL) to store the mapping of logical block addresses (Logical Block Address, referred to as LBA) to physical block addresses (Physical Block Address, referred to as PBA). The hard disk addresses accessed by the operating system are all logical addresses. Only after being translated by FTL will it become the actual physical address and find the corresponding block for access. The operating system itself does not need to consider the degree of wear and tear of the block, as long as it reads and writes data just like operating a mechanical hard disk.
Garbage collection mechanism
Write magnification (write amplification)
As mentioned above, repeated modifications of data will result in a large number of invalid pages. Once there is not enough space for the whole block (block) to write data, SSD will read the data of this block (block) into the cache, erase the pages in the block (block), and then write the updated data in the cache. This read-erase-modify-write process is like writing data that may have only one page 4KB, but actually erasing and writing N pages, called write magnification.
The greater the magnification of the write, the slower the write.
TRIM instruction
TRIM is the ATA-8 instruction of SSD, which is the key to solve the write amplification.
In the process of modification or deletion, the file system sends a notification to the invalid pages generated by the SSD record, and then uniformly reclaims and erases all invalid pages at a certain interval, and erases the block where the invalid pages are updated (block).
On the one hand, reserve enough space to avoid writing magnification due to lack of space. On the other hand, using TRIM, invalid pages are reclaimed and erased when IO is idle, which effectively ensures the performance of SSD and improves its life.
The difference between discard and TRIM
In linux terms, discard refers to TRIM
It is not recommended to use the default TRIM feature of linux system
The TRIM function can be started in two ways. One is continuous TRIM, which directly issues the TRIM command when the file system reclaims blocks. This method has a great impact on performance. When the fstab is mounted, the default is modified to discard. The other is to execute fstrim on a regular basis to perform TRIM operations in batches to avoid the usual performance impact. However, the timing of fstrim execution should be selected. After all, batch TRIM will have a greater impact on the performance of other tasks.
According to the article "Ubuntu Doesn't TRIM SSDs By Default: Why Not and How To Enable It Yourself" mentioned
"The kernel implementation of realtime trim in 11.2,11.3, and 11.4 is not optimized. The spec. Calls for trim supporting a vectorized list of trim ranges, but as of kernel 3.0 trim is only invoked by the kernel with a single discard / trim range and with current mid 2011 SSDs this has proven to cause a performance degradation instead of a performance increase. There are few reasons to use the kernels realtime discard support with pre-3.1 kernels. It is not known when the kernels discard functionality will be optimized to work beneficially with current generation SSDs." [Source]
The kernel-based discard approach is not aware of the impact on the current performance of SSD.
Practice
Use fio to test nvme bare devices
Use fio to test bare devices directly. When the speed slows down from 400MiB/s to 80MiB/s in more than 30 minutes, it is concluded that SSD triggers the uppercase phenomenon, and since the file system is not mounted, it is impossible to use fstrim to manually reclaim space (it can be understood that SSD does not know which pages are invalid without file system tags). The speed of fio test again is still 80MiB/s. After erasing completely with blkdiscard, the speed returns to normal.
reference
"Trim Command" wiki Encyclopedia
"talking about the basic principles of SSD for distributed Storage" Didi Yun
"enable SSD TRIM function under Linux" Louis
Concluding remarks
After using fio direct ssd disk for write testing, using blkdiscard on the disk can restore the original speed.
At this point, the study of "what is the principle of ssd storage" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.