The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.
Decoding smartctl disk diagnostic output
Installing smartctl
Most distributions have smartctl available. e.g. RPM-based system, you can use
yum install smartmontools
Otherwise, you can compile from the source code directly:
wget -O smart.tar.gz 'https://sourceforge.net/projects/smartmontools/files/latest/download?source=files'
tar -xvf smart.tar.gz
cd smartmontools-6.4/
./configure
sudo make
sudo make install
Running smartctl to get all SMART stats of a device
sudo /usr/local/sbin/smartctl -a /dev/sdc
Running smartct to get all stats available for a device
sudo /usr/local/sbin/smartctl -x /dev/sdc
SMART DATA attributes table
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 094 094 000 Old_age Always - 2
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 15537
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 17
170 Unknown_Attribute 0x0033 099 099 010 Pre-fail Always - 0
171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 1
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 14
175 Program_Fail_Count_Chip 0x0033 100 100 010 Pre-fail Always - 391406158447
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 082 071 000 Old_age Always - 18 (Min/Max 16/30)
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 14
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 18
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 10476538
226 Load-in_Time 0x0032 100 100 000 Old_age Always - 65535
227 Torq-amp_Count 0x0032 100 100 000 Old_age Always - 4294967295
228 Power-off_Retract_Count 0x0032 100 100 000 Old_age Always - 65535
232 Available_Reservd_Space 0x0033 099 099 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 001 001 000 Old_age Always - 0
234 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 10476538
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 19054896
Understanding the meaning of the attributes in the SMART DATA SECTION:
-
EachAttribute has a Normalized Value (VALUE), a past event value (WORST) and a threshold value (THRESH).
-
Each value ranges from 0 to 255. The values are printed under the columns “VALUE”,“WORST” and “THRESH”.
-
if VALUE <= THRESH then the attribute has failed If Attribute is of Type pre-fail then disk failure is imminent.
The Attribute table printed out by smartctl also shows the “TYPE” of the Attribute.
Attributes are one of two possible types: Pre-failure or Old age.
- Pre-fail Attribute:
if VALUE <= THRESH then the attribute has failed and Pending Disk Failure iminent.
- Old_age (or usage) Attribute,
if VALUE <= THRESH then the attribute indicates eventual end of life of the disk from old-age or normal wear,
Please note: The fact that an Attribute is of type ‘Pre-fail’ does not mean that your disk is about to fail! It only has this meaning if the Attribute´s current Normalized value (VALUE) is less than or equal to the threshold value (THRESH).
-
If the Attribute´s current Normalized value (VALUE) is less than or equal to the threshold value (THRESH), then the “WHEN_FAILED” column will display “FAILING_NOW”.
-
If not, but the WORST recorded value is less than or equal to the threshold value (THRESH), then this column will display “In_the_past”.
-
The key thing to remember when reading this table is the following:
If the “WHEN_FAILED” column has no entry (indicated by a dash: ´-´) then this Attribute is OK now (not failing) and has also never failed in the past.
Example : determine the longevity of SSD
The lifespan of SSD can be determined from the following indicators:
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 2691000
.
.
.
233 Media_Wearout_Indicator 0x0032 095 095 000 Old_age Always - 0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1906088
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 4105900
Note the above is Vendor-specifics this should be similar across different vendors. In general, the SSD will eventually wear out when the number of erase cycles reaches its limit. So the longevity of the SSD is proportional to the number of writes.
The “Host_Writes_32MiB” is the number of 32MB blocks ever written and re-written to the SSD. In this case, it is 32MB x 1906088 = 86 TB. Refer to your Vendor specific documentation to estimate the life left on the SSD.
Addtional info
https://blog.shadypixel.com/monitoring-hard-drive-health-on-linux-with-smartmontools/
http://docs.slackware.com/howtos:hardware:smart_hdd_diagnostics