[ale] Which large capacity drives are you having the best luck with?

Ron Frazier atllinuxenthinfo at c3energy.com
Thu Jan 6 01:51:28 EST 2011


Greg, I'll try to respond to your points, but we are over my head.
Before reading this, please read my SUMMARY OF SPINRITE TECHNOLOGY post,
which I sent just prior to this.  The subject title is the same as it
was, to maintain continuity of the thread.  The title I just quoted is
at the top of the text.  This will make more sense if you see that
first.

BEFORE I FORGET - DO NOT USE SPINRITE ON AN SSD DRIVE.  They use totally
different technology and writing to them excessively can substantially
reduce the life of the drive.  Also, due to the wear leveling built in,
I don't think an application can even be sure which memory sector it's
writing to.  Data recovery benefit is questionable, surface analysis
doesn't apply, and you may hurt your drive.

See comments interspersed.

Ron

On Wed, 2011-01-05 at 19:23 -0500, Greg Freemyer wrote:
> See interspersed:
> 
> But first, have you looked at the data smart is tracking:
> 
> Try "smartctl -a /dev/sda " on your machine and get a feel for it if
> you want to delve this deep.
> 
> The big issue is that smart implementation varies by manufacturer for
> sure, and I think by model and even firmware.
> 
> So understanding what these fields mean is very difficult.  But if
> you're a home user with a small number of drives to worry about, you
> could record your full smart data dump every 6 months or so and get a
> feel for how different fields are growing, etc.
> 
> 
> === fyi: on my desktop at work ===
> 
> > sudo /usr/sbin/smartctl -a /dev/sda
> 
> smartctl 5.39.1 2010-01-28 r3054 [x86_64-unknown-linux-gnu] (openSUSE RPM)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Seagate Barracuda 7200.10 family
> Device Model:     ST3250310AS
> Serial Number:    9RY00PYW
> Firmware Version: 3.AAA
> User Capacity:    250,059,350,016 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   7
> ATA Standard is:  Exact ATA specification draft version not indicated
> Local Time is:    Wed Jan  5 18:29:58 2011 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> General SMART Values:
> Offline data collection status:  (0x82) Offline data collection activity
>                                         was completed without error.
>                                         Auto Offline Data Collection: Enabled.
> Self-test execution status:      (   0) The previous self-test routine completed
>                                         without error or no self-test has ever
>                                         been run.
> Total time to complete Offline
> data collection:                 ( 430) seconds.
> Offline data collection
> capabilities:                    (0x5b) SMART execute Offline immediate.
>                                         Auto Offline data collection
> on/off support.
>                                         Suspend Offline collection upon new
>                                         command.
>                                         Offline surface scan supported.
>                                         Self-test supported.
>                                         No Conveyance Self-test supported.
>                                         Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before entering
>                                         power-saving mode.
>                                         Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                         General Purpose Logging supported.
> Short self-test routine
> recommended polling time:        (   1) minutes.
> Extended self-test routine
> recommended polling time:        (  92) minutes.
> 
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   117   100   006    Pre-fail
> Always       -       127975369
>   3 Spin_Up_Time            0x0003   098   097   000    Pre-fail
> Always       -       0
>   4 Start_Stop_Count        0x0032   100   100   020    Old_age
> Always       -       65
>   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail
> Always       -       0
>   7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail
> Always       -       330102083
>   9 Power_On_Hours          0x0032   069   069   000    Old_age
> Always       -       27871
>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age
> Always       -       65
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age
> Always       -       0
> 189 High_Fly_Writes         0x003a   100   100   000    Old_age
> Always       -       0
> 190 Airflow_Temperature_Cel 0x0022   068   051   045    Old_age
> Always       -       32 (Lifetime Min/Max 21/36)
> 194 Temperature_Celsius     0x0022   032   049   000    Old_age
> Always       -       32 (0 20 0 0)
> 195 Hardware_ECC_Recovered  0x001a   069   061   000    Old_age
> Always       -       140010573
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age
> Offline      -       0
> 202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age
> Always       -       0
> 
> SMART Error Log Version: 1
> No Errors Logged
> 
> SMART Self-test log structure revision number 1
> 
> SMART Selective self-test log data structure revision number 1
>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>     1        0        0  Not_testing
>     2        0        0  Not_testing
>     3        0        0  Not_testing
>     4        0        0  Not_testing
>     5        0        0  Not_testing
> Selective self-test flags (0x0):
>   After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
> 

I'm going to save this test for another day.  I'm getting brain fog here
at 01:42 AM.  But thanks for posting the above example.  Some of this
data is available in the Ubuntu Disk Utility, but in a more readable
format.

> ====
> 
> The first thing I look at is POH (Power on Hours).  In this case
> 27,871.  This field has been pretty reliable in my experience to be
> exactly what it says.  So my drive is not exactly new.
> 

Yes.  Also available in Disk Utility.

> Then look at Reallocated_Sector_Ct.  Mine is zero.  That's cool.
> 

Yes.  Zero is good.  Also available in Disk Utility.

> But Hardware_ECC_Recovered is 140,010,573.  That may sound large, but
> remember, the reads succeeded because of the ECC data, so there is no
> data loss.  I tend to agree with you that as magnetism fades for a
> sector, checksum failures increase and ECC recovery is needed.
> Spinrite used as you describe may keep that value lower.

For comparison.  I'm running Spinrite right now on the new drive I just
bought.  It's a Seagate 500 GB notebook drive, 7200 RPM.

After running 7 hours and processing 90 GB of data, but keep in mind,
read invert write, read invert write, it has encountered 47.24 million
ECC correctable errors.  Thus far, ultimately, no data errors once the
ECC is done.  The error rate is averaging 133,788 / million sectors.  If
you run the math, that works out to 6.75 million errors / hour of
operation.  That number is going to fluctuate wildly based on the amount
and type of work the drive is doing.  This analysis is stressing it
about as hard as possible.

In general, Spinrite doesn't publish all the smart stats.  It focuses on
it's own algorithms.  Also, if the drive or bios doesn't have smart
enabled, it publishes no smart stats, but still does its job regardless.

> 
> But I don't think spinrite tries to detect sectors that have bee ECC
> recovered.  So it doesn't really know the details.
> 
> A smart long self test has the ability to know that a ECC recovery is
> needed for a sector.  What it does with the knowledge, I don't know.
> But it certainly has more knowledge to work with than spinrite.
> 
> fyi: hdparm has a long read capability that allows a full physical
> sector to be read with no error correction!  So spinrite could in
> theory read all of the sectors with CRC verification disabled and
> check the CRC itself.  The trouble is the the drive manufactures
> implement proprietary CRC / ECC solutions, so spinrite has no way to
> actually delve into the details of the sectors data accuracy.
> 

That last sentence is not correct.  It may be true that Spinrite cannot
calculate it's own ECC.  However, if the sector doesn't read correctly,
the ECC correction is turned off.  Then, Spinrite reads the sector
repeatedly, starting from different head positions and flying to the
target sector, and accumulates up to 2000 samples of erroneous sector
data.  It uses excruciating statistical techniques to analyze the
samples and determine the most likely value, 1 or 0, for each bit.  It
then reconstructs the sector and saves what it recovered back to the
drive after a surface analysis has verified that it's safe to do so.  In
many cases, just the repeated reading from different positions will
accomplish a perfect read.  If so that perfect data is saved until a
surface analysis verifies the magnetic reliability of the sector.

> 
> On Wed, Jan 5, 2011 at 5:33 PM, Ron Frazier
> <atllinuxenthinfo at c3energy.com> wrote:
> > Hi Pat,
> >
> > We're getting a little above my level of knowledge on hard drive
> > operation here, but here's my take on it.  A modern drive is always
> > generating read errors and relying on ECC to get you the right data.
> 
> Mine has 140 million ECC corrections in 27,000 hr POH.  So that's
> about 5,000 per hour, or more than one a second!
> 
> I totally agree with your statement!
> 
> > It can be readable enough, with difficulty, without generating an error
> > flag.
> 
> Agreed
> 
> > Therefore, they may not be flagged unless they get above a
> > certain threshold.
> 
> Not flagged by who?  Smart is keeping track.  Spinrite / linux is not.
> 
> > When Spinrite tries to read the data, if it has
> > difficulty above a certain limit, I believe it relocates the data
> > somewhere else.
> 
> AIUI - False.  If the sector is ECC correctable, spinrite has no idea.
> 
> If the sector is not-ECC correctable the DRIVE marks the sector as
> pending relocation.  See the smart Current_Pending_Sector field.  For
> each unreadable sector, this is increased by 1.

Perhaps I misstated the above.  If Spinrite gets a perfect read, it
accepts the data and keeps it.  It then proceeds to do surface analysis
on the sector to confirm that it is safe to store the data.  Presumably,
this is when the data is inverted and written back.  At this point, if
the read was perfect, presumably the write will be.  It will then be
read, inverted again back to its original state, and written back.

However, if the original read is flawed, things change a bit.  ECC is
turned off and sector swapping is delayed while Dynastat kicks in to
start collecting its 2000 statistical samples.  The drive is prevented
from swapping the sector during this time.  Once that's done, the sector
is analyzed for safety, which would include numerous reads and writes.
This may cause a sector swap, and if so, the new sector will be the one
analyzed.  Once Spinrite is convinced that data storage is possible and
safe, the perfect data, if it was ever read, or the statistically
created data, is written to the sector.

> 
> When that sector is written (by linux or spinrite) then the sector is
> reallocated to a spare sector.  And the old sector is not used again.
> 
> fyi: hdparm has a way to force a write to Pending Sector and put new
> good data on it.  Thus spinrite could do this if it wanted to as well.
>  I certainly hope it is not doing so.
> 

I don't see why it would need to write to a sector that the drive wants
to swap, but don't know for sure.  It does have to prevent swapping
until it's finished data recovery.  But, I would assume the swap, if
needed, is always allowed before writing the new final data back.

> >  This may or may not raise a flag in the smart system.
> 
> It does.  see above.
> 
> > It also doesn't mean that the sector has been reallocated.
> 
> You imply a sector can be moved without it being reallocated.  I think
> that is wrong.  The only way to move the sector is to allocate a spare
> and use it instead of the original.
> 

What I meant was that the data will be read, recovered if needed, and
written back.  The final data may be written back to the original
sector, or a swapped sector.  I don't know what logic the drive uses to
determine this.

> >  The
> > intensive analysis of Spinrite inverts the data and writes it back,
> > reads it, inverts it again to its original state, then writes it back
> > again.
> 
> That is nice because it should allow the drive to identify magnetic
> "holes".  When found the drive itself is likely doing the spare sector
> allocation.
> 
> 
> > This forces the drive's firmware to evaluate the performance at
> > that point, and forces the surface to absorb both a 1 and 0 in turn at
> > that point.  Also, I believe that the magnetic fields deteriorate over
> > time.  I could probably corroborate that with some extensive research.
> 
> Agreed, but I often store hard drives offline for extended periods.
> We rarely see read failures for drives we put back on line.  So the
> deteriation is very slow and not likely to be an issue.
> 

It may be slow, but if the one file that crashes is your critical
program, database, or contract, that's still a problem.

> fyi: The DOD uses thermite in the drive platter area to heat the media
> to several hundred degrees.  When this happens the magnetism is
> released and the data is gone.
> 
> > Just anecdotally though, most of the files I've ever lost due to disk
> > malfunctions seem to be things that were almost never accessed except
> > rarely.
> 
> Somewhat logical.  The drives "smart" function doesn't get exposed to
> those sectors, so as the sector degrades / fails, it doesn't know
> about it.
> 
> Especially with laptop drives, you get physical damage as the flying
> head hits the platters from time to time.  To protect the platters,
> they are often actually coated with a fine coat of diamond dust.
> That's one reason laptop drives cost more.
> 
> > The read invert write read invert write cycle, if nothing else,
> > will ensure that all the magnetic bits are good and strong since they
> > are all ultimately rewritten.
> 
> True, but I think normal degradation is much slower than you imply.
> 

That's possible.  I couldn't find any studies on it.  However, rewriting
the data every 4 months eliminates the problem.  Now, you could debate
endlessly about the cost vs benefits of doing that (spending time, etc)
vs doing nothing.  Could probably write books about it.

For my purposes - "I don't want no bit rot invadin' my drives!"  Said
while holding shotgun, etc. ;-)

> > There are basically 3 possibilities prior to running the diagnostic:
> >
> > 1) The drive has no serious (above threshold) errors either latent or
> > obvious. - In this case, every magnetic domain on the surface will be
> > refreshed, which is good, and will keep the read error rate as far below
> > the threshold limit as possible.  Also, the firmware will be forced to
> > evaluate the performance of every bit and byte.
> 
> For a drive you've treated with spinrite, what's your ECC_Recovered / POH ratio.
> 

I quoted some numbers while running the test above.  I have no
historical data, and have no idea what's appropriate.  I'm thrashing the
drive to death, so I'd expect the numbers to be insanely high.  Your
numbers probably reflect thousands of hours of very low utilization, to
use a CPU term.  Therefore, you'd have relatively few recoveries on
average per hour.  An interesting number that Spinrite quotes is the
average errors / million sectors, which shouldn't vary much with usage.
Steve says an increase in this can indicate imminent failure.

> ie. Mine is 5000 recoveries per power on hour.  And I don't do
> anything to "maintain" it.  This is just my desktop machine.
> 
> > 2) The drive has latent errors but no warnings. - There may be areas
> > that are barely readable or that are not readable under normal
> > circumstances (but have never been attempted).  They will be read if
> > possible after extensive effort, and will be relocated to a different
> > part of the drive if needed.  This may or may not cause a sector
> > reallocation or generate any error messages.  Again, the magnetic fields
> > will be refreshed.
> 
> 
> >
> > 3) The drive has obvious errors and warnings. - In this case it is
> > likely that some data is unreadable by conventional means.  It is highly
> > likely that Spinrite will recover the data and save it elsewhere on the
> > drive, storing it in fresh strong magnetic domains.
> 
> I believe a smart long self test will read all of the sectors and
> identify those that are not ECC Recoverable.  I don't think it will
> actually reallocate them.
> 

I never could find out what that test does.

> What spinrite likely does is read the sector in various ways.  ie many
> data recovery tools can read the sectors in reverse order.  This
> causes the drive to align the head slightly differently I believe.
> Due to that slight change, some bad sectors can be read.  So I
> actually do think spinrite could have some logic to do this that
> normal read logic would not have.
> 
> 

Yes.  See above.

> > Again, this may or
> > may not trigger sector reallocation.
> 
> I surely hope writing to a sector previously had read failures not
> handle-able via ECC recovery triggers a reallocate.
> 

I would assume so too, but don't know if the drive has a certain
threshold.  I don't know if the program can force a reallocate, but I do
know it delays them while data recovery is going on.

> >  Spinrite will report these data
> > areas as recovered or unrecovered as appropriate.  The drive itself may
> > still be fully usable, if, for example, the data error was caused by a
> > power failure, but the drive was not damaged.  If sectors start getting
> > reallocated, I would agree that it's time to consider changing the drive
> > out, as I did with one of mine last night.

> I'm not so sure I agree.  A lot of reallocates are just physical
> platter issues.  It used to be that drives shipped new with lots
> reallocated sectors.
> 
> Admittedly, new ones tend to have zero anymore.
> 

I used to be more tolerant and less knowledgeable of reallocated
sectors.  However, the Google study I mentioned said a drive is 16 - 21
times more likely to fail in the next 60 days after a reallocate event
(assuming the summary I read was correct).  I just swapped out a drive
last night which had only 2 bad sectors, just to be sure.  Who knows, it
might have lasted another year.  I am having Seagate replace it though.
The drive itself was a replacement, and it's only a year old.  I still
have intil 2014 in my original warranty, so out it goes.

> 
> > Regardless, Spinrite can
> > often recover your data enough to boot the drive and get the data off
> > before decommissioning the drive.  The smart reporting screen of
> > Spinrite is somewhat hard to read, and I don't know if it reports sector
> > reallocation.  I would use the Ubuntu Disk Utility or gsmartcontrol /
> > smartctl as a final check to look for warning signs (now that I know
> > about it) even if Spinrite is happy.
> >
> > I'm not suggesting that everyone has to use the product, just sharing
> > some info that I feel might be helpful.  I have found the product useful
> > in the past.  To each his own.
> >
> > Sincerely,
> >
> > Ron
> 
> Greg
> 

Hopefully, this made some sense.  My brain is dead.  Signing off for the
night.

Ron

-- 

(PS - If you email me and don't get a quick response, you might want to
call on the phone.  I get about 300 emails per day from alternate energy
mailing lists and such.  I don't always see new messages very quickly.)

Ron Frazier

770-205-9422 (O)   Leave a message.
linuxdude AT c3energy.com



More information about the Ale mailing list