[Nagiosplug-devel] [ nagiosplug-Bugs-2930789 ] check_ide_smart ignores SMART errors !
SourceForge.net
noreply at sourceforge.net
Tue Jan 12 18:55:40 CET 2010
Bugs item #2930789, was opened at 2010-01-12 17:55
Message generated for change (Tracker Item Submitted) made by oernii
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=397597&aid=2930789&group_id=29880
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General plugin execution
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Ernest Beinrohr (oernii)
Assigned to: Nobody/Anonymous (nobody)
Summary: check_ide_smart ignores SMART errors !
Initial Comment:
I began using this plugin and now found that it is no good. Everything seems good, the plugin checks the disks and some tests but not i do have a BAD drive. smartctl reports 7 unrecovereble errors. A clear sign for imminent failure and I am replacing it therefore. But check_ide_smart reports that everything is OK !
There is clearly some problem, the plugin should NOT ignore such a thing as SMART errors. It's its main and only purpose. Here are the outputs for my /dev/sdd.
PS: check_ide_smart v1991 (nagios-plugins 1.4.13). also tried nagios-plugins-1.4.14-61-g45e2
$ /usr/lib/nagios/plugins/check_ide_smart -d /dev/sdd
Id= 1, Status=15 {PreFailure , OnLine }, Value=114, Threshold= 6, Passed
Id= 3, Status= 3 {PreFailure , OnLine }, Value= 93, Threshold= 0, Passed
Id= 4, Status=50 {Advisory , OnLine }, Value=100, Threshold= 20, Passed
Id= 5, Status=51 {PreFailure , OnLine }, Value=100, Threshold= 36, Passed
Id= 7, Status=15 {PreFailure , OnLine }, Value= 43, Threshold= 30, Passed
Id= 9, Status=50 {Advisory , OnLine }, Value= 90, Threshold= 0, Passed
Id= 10, Status=19 {PreFailure , OnLine }, Value=100, Threshold= 97, Passed
Id= 12, Status=50 {Advisory , OnLine }, Value=100, Threshold= 20, Passed
Id=184, Status=50 {Advisory , OnLine }, Value=100, Threshold= 99, Passed
Id=187, Status=50 {Advisory , OnLine }, Value= 93, Threshold= 0, Passed
Id=188, Status=50 {Advisory , OnLine }, Value=100, Threshold= 0, Passed
Id=189, Status=58 {Advisory , OnLine }, Value=100, Threshold= 0, Passed
Id=190, Status=34 {Advisory , OnLine }, Value= 86, Threshold= 45, Passed
Id=194, Status=34 {Advisory , OnLine }, Value= 14, Threshold= 0, Passed
Id=195, Status=26 {Advisory , OnLine }, Value= 46, Threshold= 0, Passed
Id=197, Status=18 {Advisory , OnLine }, Value=100, Threshold= 0, Passed
Id=198, Status=16 {Advisory , OffLine}, Value=100, Threshold= 0, Passed
Id=199, Status=62 {Advisory , OnLine }, Value=200, Threshold= 0, Passed
OffLineStatus=130 {Completed}, AutoOffLine=Yes, OffLineTimeout=10 minutes
OffLineCapability=123 {Immediate Auto SuspendOnCmd}
SmartRevision=10, CheckSum=161, SmartCapability=3 {SaveOnStandBy AutoSave}
-------------------
$ smartctl -a /dev/sdd
smartctl version 5.38 [i586-mandriva-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: MAXTOR STM31000340AS
Serial Number: 9QJ1CCR0
Firmware Version: MX1A
User Capacity: 1 000 204 886 016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Tue Jan 12 18:46:47 2010 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 634) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 227) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103b) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 099 006 Pre-fail Always - 60566982
3 Spin_Up_Time 0x0003 093 091 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 40
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 043 037 030 Pre-fail Always - 61272258239168
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 9615
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 40
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 093 093 000 Old_age Always - 7
188 Unknown_Attribute 0x0032 100 099 000 Old_age Always - 4295032834
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 086 049 045 Old_age Always - 14 (Lifetime Min/Max 4/30)
194 Temperature_Celsius 0x0022 014 051 000 Old_age Always - 14 (0 4 0 0)
195 Hardware_ECC_Recovered 0x001a 046 026 000 Old_age Always - 60566982
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 7 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 7 occurred at disk power-on lifetime: 9006 hours (375 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 03 b8 13 00 Error: UNC at LBA = 0x0013b803 = 1292291
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 fd b7 13 e0 00 2d+21:57:34.419 READ DMA
27 00 00 00 00 00 e0 00 2d+21:57:34.417 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 2d+21:57:34.397 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 2d+21:57:34.376 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 2d+21:57:34.336 READ NATIVE MAX ADDRESS EXT
Error 6 occurred at disk power-on lifetime: 9006 hours (375 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 03 b8 13 00 Error: UNC at LBA = 0x0013b803 = 1292291
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 fd b7 13 e0 00 2d+21:57:31.288 READ DMA
27 00 00 00 00 00 e0 00 2d+21:57:31.287 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 2d+21:57:31.267 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 2d+21:57:31.247 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 2d+21:57:31.196 READ NATIVE MAX ADDRESS EXT
Error 5 occurred at disk power-on lifetime: 9006 hours (375 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 03 b8 13 00 Error: UNC at LBA = 0x0013b803 = 1292291
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 fd b7 13 e0 00 2d+21:57:28.198 READ DMA
27 00 00 00 00 00 e0 00 2d+21:57:28.197 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 2d+21:57:28.177 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 2d+21:57:28.156 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 2d+21:57:28.116 READ NATIVE MAX ADDRESS EXT
Error 4 occurred at disk power-on lifetime: 9006 hours (375 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 03 b8 13 00 Error: UNC at LBA = 0x0013b803 = 1292291
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 fd b7 13 e0 00 2d+21:57:25.007 READ DMA
27 00 00 00 00 00 e0 00 2d+21:57:25.005 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 2d+21:57:24.985 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 2d+21:57:24.966 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 2d+21:57:24.907 READ NATIVE MAX ADDRESS EXT
Error 3 occurred at disk power-on lifetime: 9006 hours (375 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 02 b8 13 00 Error: UNC at LBA = 0x0013b802 = 1292290
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 fd b7 13 e0 00 2d+21:57:21.899 READ DMA
27 00 00 00 00 00 e0 00 2d+21:57:21.897 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 2d+21:57:21.877 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 2d+21:57:21.860 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00 2d+21:57:21.807 READ NATIVE MAX ADDRESS EXT
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=397597&aid=2930789&group_id=29880
More information about the Devel
mailing list