[SGVLUG] hd errors in /var/log/message
James Neff
jneff at tethyshealth.com
Thu May 15 06:40:21 PDT 2008
Thank you Claude (and others) for your response.
I ran smartctl -a /dev/hda
Here are the interesting lines:
SMART overall-health self-assessment test result: PASSED
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 252 252 063 Pre-fail
Always - 2340
4 Start_Stop_Count 0x0032 253 253 000 Old_age
Always - 19
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail
Always - 1
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail
Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age
Always - 0
8 Seek_Time_Performance 0x0027 248 236 187 Pre-fail
Always - 34604
9 Power_On_Hours 0x0032 197 197 000 Old_age
Always - 59860
10 Spin_Retry_Count 0x002b 252 252 157 Pre-fail
Always - 0
11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 253 253 000 Old_age
Always - 21
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age
Always - 10
193 Load_Cycle_Count 0x0032 253 253 000 Old_age
Always - 65
194 Temperature_Celsius 0x0032 253 253 000 Old_age
Always - 26
195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age
Always - 8861
196 Reallocated_Event_Count 0x0008 253 253 000 Old_age
Offline - 0
197 Current_Pending_Sector 0x0008 253 253 000 Old_age
Offline - 1
198 Offline_Uncorrectable 0x0008 253 253 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age
Offline - 0
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age
Always - 0
201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age
Always - 3
202 TA_Increase_Count 0x000a 253 252 000 Old_age
Always - 0
203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail
Always - 1
204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age
Always - 0
205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age
Always - 0
207 Spin_High_Current 0x002a 252 252 000 Old_age
Always - 0
208 Spin_Buzz 0x002a 252 252 000 Old_age
Always - 0
209 Offline_Seek_Performnce 0x0024 253 253 000 Old_age
Offline - 0
99 Unknown_Attribute 0x0004 253 253 000 Old_age
Offline - 0
100 Unknown_Attribute 0x0004 253 253 000 Old_age
Offline - 0
101 Unknown_Attribute 0x0004 253 253 000 Old_age
Offline - 0
We have Spinrite so I will run that to see what it finds and if it can
fix it.
I found this article too:
http://www.linuxjournal.com/article/6983
They mention the smartd.conf file but I don't think they answer your
question about the DEVICESCAN option.
Thanks again for the thoughtful response.
--James
Claude Felizardo wrote:
> On Wed, May 14, 2008 at 7:04 AM, James Neff <jneff at tethyshealth.com> wrote:
>
>> Greetings,
>>
>> I am seeing this in my /var/log/message:
>>
>> May 11 04:05:59 private-gateway kernel: hda: dma_intr: status=0x51 {
>> DriveReady SeekComplete Error }
>> May 11 04:05:59 private-gateway kernel: hda: dma_intr: error=0x40 {
>> UncorrectableError }, LBAsect=79957639, sector=79957632
>> May 11 04:05:59 private-gateway kernel: ide: failed opcode was: unknown
>> May 11 04:05:59 private-gateway kernel: end_request: I/O error, dev hda,
>> sector 79957632
>>
>>
>> It only occurs at the time 4:05am when my daily crontab runs. Also, it is
>> always the same sector.
>>
>> Should I be concerned or take any action at this time?
>> Thanks in advance,
>> James
>>
>
> Have you checked the SMART attributes using smartctl or equiv? Run
> any self tests?
>
> see http://www.ntfs.com/disk-monitor-smart-attributes.htm
>
> here's what I get on my machine here at work
>
> smartctl --health /dev/sda
> smartctl version 5.37 [i586-mandriva-linux-gnu] Copyright (C) 2002-6 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> Have you done a quick google search on "DriveReady SeekComplete Error"?
>
>
> Aren't SMART drives suppose to remap when you get a bad sector?
> According a few sites I looked at it might require a reformat or
> something to force the remapping.
>
> One place suggests checking the current_Pending_Sector to see if it's
> anything but zero:
>
> smartctl --all /dev/sda | grep Sector
> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
> Always - 0
> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age
> Always - 0
>
>
> Check here for how to run self tests:
> http://www.captain.at/howto-linux-smartmontools-smartctl.php
>
> Be carefull, though. I once ran some of these when I was tracking
> down a bad cable and boy did that make a mess of things...
>
> smartctl -t short /dev/sda
> Please wait 2 minutes for test to complete.
> Test will complete after Wed May 14 16:24:05 2008
> ...
> smartctl -l selftest /dev/sda
> Num Test_Description Status Remaining
> LifeTime(hours) LBA_of_first_error
> # 1 Short offline Completed without error 00% 22494 -
>
>
> Hey, does anyone use smartd and have an example configuration file in
> use? I'm a little confused on the use of the DEVICESCAN option.
> Should it be used or commented out?
>
>
> btw, do you monitor the temperature of the drive to see if it's
> getting significantly warmer during this time? At home, two of the
> drives climb a few degrees but the one at the top jumps about 4
> degrees for a few hours while it's running msec and stuff.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.sgvlug.net/pipermail/sgvlug/attachments/20080515/ff30a032/attachment.html
More information about the SGVLUG
mailing list