Decoding smartctl output


#1

Decoding smartctl disk diagnostic output

Installing smartctl

Most distributions have smartctl available. e.g. RPM-based system, you can use

yum install smartmontools

Otherwise, you can compile from the source code directly:

wget -O smart.tar.gz 'https://sourceforge.net/projects/smartmontools/files/latest/download?source=files'
tar -xvf smart.tar.gz
cd smartmontools-6.4/
./configure
sudo make
sudo make install

Running smartctl to get all SMART stats of a device

sudo /usr/local/sbin/smartctl -a /dev/sdc

Running smartct to get all stats available for a device

sudo /usr/local/sbin/smartctl -x /dev/sdc

SMART DATA attributes table

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   094   094   000    Old_age   Always       -       2
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       15537
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       17
170 Unknown_Attribute       0x0033   099   099   010    Pre-fail  Always       -       0
171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       1
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       14
175 Program_Fail_Count_Chip 0x0033   100   100   010    Pre-fail  Always       -       391406158447
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   082   071   000    Old_age   Always       -       18 (Min/Max 16/30)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       14
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       18
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       10476538
226 Load-in_Time            0x0032   100   100   000    Old_age   Always       -       65535
227 Torq-amp_Count          0x0032   100   100   000    Old_age   Always       -       4294967295
228 Power-off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       65535
232 Available_Reservd_Space 0x0033   099   099   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   001   001   000    Old_age   Always       -       0
234 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       10476538
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       19054896

Understanding the meaning of the attributes in the SMART DATA SECTION:

  • EachAttribute has a Normalized Value (VALUE), a past event value (WORST) and a threshold value (THRESH).

  • Each value ranges from 0 to 255. The values are printed under the columns “VALUE”,“WORST” and “THRESH”.

  • if VALUE <= THRESH then the attribute has failed If Attribute is of Type pre-fail then disk failure is imminent.

The Attribute table printed out by smartctl also shows the “TYPE” of the Attribute.
Attributes are one of two possible types: Pre-failure or Old age.

  • Pre-fail Attribute:

if VALUE <= THRESH then the attribute has failed and Pending Disk Failure iminent.

  • Old_age (or usage) Attribute,

if VALUE <= THRESH then the attribute indicates eventual end of life of the disk from old-age or normal wear,

Please note: The fact that an Attribute is of type ‘Pre-fail’ does not mean that your disk is about to fail! It only has this meaning if the Attribute´s current Normalized value (VALUE) is less than or equal to the threshold value (THRESH).

  • If the Attribute´s current Normalized value (VALUE) is less than or equal to the threshold value (THRESH), then the “WHEN_FAILED” column will display “FAILING_NOW”.

  • If not, but the WORST recorded value is less than or equal to the threshold value (THRESH), then this column will display “In_the_past”.

  • The key thing to remember when reading this table is the following:

If the “WHEN_FAILED” column has no entry (indicated by a dash: ´-´) then this Attribute is OK now (not failing) and has also never failed in the past.

Example : determine the longevity of SSD

The lifespan of SSD can be determined from the following indicators:

226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2691000
.
.
.
233 Media_Wearout_Indicator 0x0032   095   095   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       1906088
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       4105900

Note the above is Vendor-specifics this should be similar across different vendors. In general, the SSD will eventually wear out when the number of erase cycles reaches its limit. So the longevity of the SSD is proportional to the number of writes.

The “Host_Writes_32MiB” is the number of 32MB blocks ever written and re-written to the SSD. In this case, it is 32MB x 1906088 = 86 TB. Refer to your Vendor specific documentation to estimate the life left on the SSD.

Addtional info

https://blog.shadypixel.com/monitoring-hard-drive-health-on-linux-with-smartmontools/

http://docs.slackware.com/howtos:hardware:smart_hdd_diagnostics