[SGVLUG] hd errors in /var/log/message

Thu May 15 06:40:21 PDT 2008

Thank you Claude (and others) for your response.

I ran smartctl -a /dev/hda

Here are the interesting lines:

SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      
UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   252   252   063    Pre-fail  
Always       -       2340
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   
Always       -       19
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  
Always       -       1
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  
Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   
Always       -       0
  8 Seek_Time_Performance   0x0027   248   236   187    Pre-fail  
Always       -       34604
  9 Power_On_Hours          0x0032   197   197   000    Old_age   
Always       -       59860
 10 Spin_Retry_Count        0x002b   252   252   157    Pre-fail  
Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  
Always       -       0
 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   
Always       -       21
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   
Always       -       10
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   
Always       -       65
194 Temperature_Celsius     0x0032   253   253   000    Old_age   
Always       -       26
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   
Always       -       8861
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   
Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   
Offline      -       1
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   
Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   
Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   
Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   
Always       -       3
202 TA_Increase_Count       0x000a   253   252   000    Old_age   
Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  
Always       -       1
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   
Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   
Always       -       0
207 Spin_High_Current       0x002a   252   252   000    Old_age   
Always       -       0
208 Spin_Buzz               0x002a   252   252   000    Old_age   
Always       -       0
209 Offline_Seek_Performnce 0x0024   253   253   000    Old_age   
Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age   
Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age   
Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age   
Offline      -       0

We have Spinrite so I will run that to see what it finds and if it can 
fix it. 

I found this article too:

http://www.linuxjournal.com/article/6983

They mention the smartd.conf file but I don't think they answer your 
question about the DEVICESCAN option.

Thanks again for the thoughtful response. 

--James

Claude Felizardo wrote:
> On Wed, May 14, 2008 at 7:04 AM, James Neff <jneff at tethyshealth.com> wrote:
>   
>> Greetings,
>>
>> I am seeing this in my /var/log/message:
>>
>> May 11 04:05:59 private-gateway kernel: hda: dma_intr: status=0x51 {
>> DriveReady SeekComplete Error }
>> May 11 04:05:59 private-gateway kernel: hda: dma_intr: error=0x40 {
>> UncorrectableError }, LBAsect=79957639, sector=79957632
>> May 11 04:05:59 private-gateway kernel: ide: failed opcode was: unknown
>> May 11 04:05:59 private-gateway kernel: end_request: I/O error, dev hda,
>> sector 79957632
>>
>>
>> It only occurs at the time 4:05am when my daily crontab runs.  Also, it is
>> always the same sector.
>>
>> Should I be concerned or take any action at this time?
>> Thanks in advance,
>> James
>>     
>
> Have you checked the SMART attributes using smartctl or equiv?  Run
> any self tests?
>
> see http://www.ntfs.com/disk-monitor-smart-attributes.htm
>
> here's what I get on my machine here at work
>
> smartctl --health /dev/sda
> smartctl version 5.37 [i586-mandriva-linux-gnu] Copyright (C) 2002-6 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> Have you done a quick google search on "DriveReady SeekComplete Error"?
>
>
> Aren't SMART drives suppose to remap when you get a bad sector?
> According a few sites I looked at it might require a reformat or
> something to force the remapping.
>
> One place suggests checking the current_Pending_Sector to see if it's
> anything but zero:
>
> smartctl --all /dev/sda | grep Sector
>   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail
> Always       -       0
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age
> Always       -       0
>
>
> Check here for how to run self tests:
> http://www.captain.at/howto-linux-smartmontools-smartctl.php
>
> Be carefull, though.  I once ran some of these when I was tracking
> down a bad cable and boy did that make a mess of things...
>
> smartctl -t short  /dev/sda
> Please wait 2 minutes for test to complete.
> Test will complete after Wed May 14 16:24:05 2008
> ...
> smartctl -l selftest  /dev/sda
> Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
> # 1  Short offline       Completed without error       00%     22494         -
>
>
> Hey, does anyone use smartd and have an example configuration file in
> use?  I'm a little confused on the use of the DEVICESCAN option.
> Should it be used or commented out?
>
>
> btw, do you monitor the temperature of the drive to see if it's
> getting significantly warmer during this time?  At home, two of the
> drives climb a few degrees but the one at the top jumps about 4
> degrees for a few hours while it's running msec and stuff.
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.sgvlug.net/pipermail/sgvlug/attachments/20080515/ff30a032/attachment.html