Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to check the life span and judge the health of SSD under CentOS

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to carry out life check and health judgment on SSD under CentOS, I believe most people do not know much about it, so share this article for your reference. I hope you will gain a lot after reading this article. Let's learn about it together.

Intel's SSD is the only one on the Internet to view the life of the hard disk. It's unfair for poor people like us who can only afford to use Crucial and OCZ. For example, if I want to use the RAID card, I really can't see the SSD life of other merchants?

After a study, all commands to view SSD, as long as they go through RAID, need the help of MegaCli and smartCtl to get the usage of ssd disk. I've studied it carefully, and what I'm using at present

RAID cards are LSI Logic / Symbios Logic MegaRAID SAS 1078 and 2108. Use the usual MegaCli to query:

This is the download address:

MegaCli of Centos5

MegaCli of Centos6

The whole process is divided into two steps. The first step is to get the information of the hard drive below from the RAID card. Then use smartCtl to display the details of the hard disk.

Use MegaCli to get the information of the hard drive under the RAID card:

Then use the following command:

/ opt/MegaRAID/MegaCli/MegaCli64-PDList-aALL

In this way, you can find out the contents under the RAID card. It will be displayed as follows:

Enclosure Device ID: 252

Slot Number: 7

Device Id: 28

Sequence Number: 2

Media Error Count: 0

Other Error Count: 1

Predictive Failure Count: 0

Last Predictive Failure Event Seq Number: 0

PD Type: SATA

Raw Size: 119.242 GB [0xee7c2b0 Sectors]

Non Coerced Size: 118.742 GB [0xed7c2b0 Sectors]

Coerced Size: 118.277 GB [0xec8e000 Sectors]

Firmware state: Online, Spun Up

SAS Address (0): 0x1e394d57aa996b80

Connected Port Number: 7 (path0)

Inquiry Data: 0000000011070303A99EC300-CTFDDAC128MAG 0007

FDE Capable: Not Capable

FDE Enable: Disable

Secured: Unsecured

Locked: Unlocked

Needs EKM Attention: No

Foreign State: None

Device Speed: 6.0Gb/s

Link Speed: 1.5Gb/s

Media Type: Solid State Device

Note that the above places will output a lot of such information, only Media Type: Solid State Device. It means that this is SSD. Among them, Device Id: 28 needs to be written down. This will be needed later when you use the smartctl query. We can see the model of the hard drive shown above: Inquiry Data: 0000000011070303A99EC300-CTFDDAC128MAG 0007. There is also a sign that tells you whether the SSD is a normal Firmware state: Online,Spun Up option, so if you do the SSD monitoring alarm, just monitor this parameter directly.

Use smartctl to get the details of the SSD hard drive

It should be noted that different manufacturers have different information for different types of disks. Hard disk information such as intel will not be introduced. Here are the commands used for the query. Where-an is to display all the information. -d is used to set up the hard drive. It is important to note that different RAID cards may use different interfaces, so there may be small differences.

For example, the hard disk of intel can be normal by directly using-d megaraid,27. But after I use the above raid card, I need to specify the parameter sat, which looks like this:

Smartctl-a-d sat+megaraid,27 / dev/sdb1-s on

The sat above refers to the device that is converted from SCSI to ATA, with parameters such as scsi,ata.

At this point, the following information is displayed:

Model Family: Crucial/Micron RealSSD C300/C400

Device Model: C300-CTFDDAC128MAG

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x002f 100 000 Pre-fail Always-0

5 Reallocated_Sector_Ct 0x0033 100000 Pre-fail Always-0

9 Power_On_Hours 0x0032 100 100 000 Old_age Always-5572

12 Power_Cycle_Count 0x0032 100 000 Old_age Always-3

170 Grown_Failing_Block_Ct 0x0033 100 000 Pre-fail Always-0

171 Program_Fail_Count 0x0032 100 100 000 Old_age Always-0

172 Erase_Fail_Count 0x0032 100 000 Old_age Always-0

173 Wear_Levelling_Count 0x0033 090 090 000 Pre-fail Always-536

174 Unexpect_Power_Loss_Ct 0x0032 100 000 Old_age Always-1

181 Non4k_Aligned_Access 0x0022 100 100 000 Old_age Always-000

183 SATA_Iface_Downshift 0x0032 100 000 Old_age Always-0

184 End-to-End_Error 0x0033 100 100 000 Pre-fail Always-0

187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always-0

188 Command_Timeout 0x0032 100 100 000 Old_age Always-0

189 Factory_Bad_Block_Ct 0x000e 100 100 000 Old_age Always-250

195 Hardware_ECC_Recovered 0x003a 100000 Old_age Always-0

196 Reallocated_Event_Count 0x0032 100000 Old_age Always-0

197 Current_Pending_Sector 0x0032 100000 Old_age Always-0

198 Offline_Uncorrectable 0x0030 100000 Old_age Offline-0

199 UDMA_CRC_Error_Count 0x0036 100000 Old_age Always-0

202 Perc_Rated_Life_Used 0x0018 090090000 Old_age Offline-10

206 Write_Error_Rate 0x000e 100000 Old_age Always-0

If it belongs to OCZ:

Device Model: OCZ-AGILITY3

Serial Number: OCZ-1OX963Q8B5X2V684

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000f 086 086 050 Pre-fail Always-135388659

5 Reallocated_Sector_Ct 0x0033 100100003 Pre-fail Always-9

9 Power_On_Hours 0x0032 100000 Old_age Always-265772576277126

12 Power_Cycle_Count 0x0032 100 000 Old_age Always-15

171 Unknown_Attribute 0x0032 000000 000 Old_age Always-9

172 Unknown_Attribute 0x0032 000000 Old_age Always-0

174 Unknown_Attribute 0x0030 000000 Old_age Offline-13

177 Wear_Leveling_Count 0x0000 000000 Old_age Offline-1

181 Program_Fail_Cnt_Total 0x0032 000000 000 Old_age Always-9

182 Erase_Fail_Count_Total 0x0032 000000 000 Old_age Always-0

187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always-0

194 Temperature_Celsius 0x0022 030 030 000 Old_age Always-30 (Lifetime Min/Max 30 Universe 30)

195 Hardware_ECC_Recovered 0x001c 120000 Old_age Offline-135388659

196 Reallocated_Event_Count 0x0033 100 100003 Pre-fail Always-9

201 Soft_Read_Error_Rate 0x001c 120000 Old_age Offline-135388659

204 Soft_ECC_Correction 0x001c 120000 Old_age Offline-135388659

230 Head_Amplitude 0x0013 100 000 Pre-fail Always-100

231 Temperature_Celsius 0x0013 100100 010 Pre-fail Always-0

233 Media_Wearout_Indicator 0x0000 000000 Old_age Offline-2531

234 Unknown_Attribute 0x0032 000000 Old_age Always-3465

241 Total_LBAs_Written 0x0032 000000 Old_age Always-3465

242 Total_LBAs_Read 0x0032 000000 Old_age Always-2030

Parameter analysis of whether SSD is healthy or not:

Note that the service life is no longer like the Media_Wearout_Indicator parameter of intel ssd (of course, OCZ also has it, which becomes Perc_Rated_Life_Used in Crucial). But in fact, we need to see whether SSD is healthy or not, mainly through the parameter Wear Leveling Count (average number of erases per particle) and the parameter Grown Failling Block Ct.

Notice the following two lines:

170 Grown_Failing_Block_Ct 0x0033 100 000 Pre-fail Always-0

173 Wear_Levelling_Count 0x0033 090 090 000 Pre-fail Always-536

The above two parameters are the key:

Wear Levelling Count (average number of erasures of particles): let's talk about this parameter first. More important. First declare that this hard drive is a SSD hard drive that has been in use for a year. The data shown in the figure is 536, that is, the total write / erase number of this 128G hard disk is 536 times, showing a life span of 90%. So the life of the flash memory particles used on this hard disk is more than 5000 times. 536 is about 10% of 5000, so this value is 90 (CA). Grown Failing Block Count (number of Bad blocks added in use): this item represents the number of bad blocks (similar to HDD's) in the use of SSD's flash memory particles. If the data here is 0, there are no bad blocks, if you have a bad life. When the newly purchased SSD is in normal use, this data changes greatly in a very short period of time, which means that there may be something wrong with the disk, so find after-sales service as soon as possible.

Introduction to common parameter combinations of MegaCli:

MegaCli-cfgdsply-aALL | grep "Error" [normal is 0]

MegaCli-LDGetProp-Cache-LALL-a0 [write policy]

MegaCli-cfgdsply-aALL | grep "Memory" [memory size]

MegaCli-LDInfo-Lall-aALL [check RAID level]

MegaCli-AdpAllInfo-aALL [check raid card information]

MegaCli-PDList-aALL [View hard disk information]

MegaCli-AdpBbuCmd-aAll [View Battery Information]

MegaCli-FwTermLog-Dsply-aALL [View raid Card Log]

MegaCli-adpCount [displays the number of adapters]

MegaCli-AdpGetTime-aALL [display adapter time]

MegaCli-AdpAllInfo-aAll [shows all adapter information]

MegaCli-LDInfo-LALL-aAll [displays all logical disk group information]

MegaCli-PDList-aAll [displays all physical information]

MegaCli-AdpBbuCmd-GetBbuStatus-aALL | grep "Charger Status" [View charging status]

MegaCli-AdpBbuCmd-GetBbuStatus-aALL [display BBU status information]

MegaCli-AdpBbuCmd-GetBbuCapacityInfo-aALL [display BBU capacity information]

MegaCli-AdpBbuCmd-GetBbuDesignInfo-aALL [display BBU design parameters]

MegaCli-AdpBbuCmd-GetBbuProperties-aALL [Show current BBU properties]

MegaCli-cfgdsply-aALL [displays raid card model, RAID settings, DISK related information]

The change in the state of the tape, from pulling the disk to inserting the disk:

Device | Normal | Damage | Rebuild | Normal

Virtual Drive | Optimal | Degraded | Degraded | Optimal

Physical Drive | Online | Failed-> Unconfigured | Rebuild | Online

The above is all the contents of the article "how to check the life span and judge the health of SSD under CentOS". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report