<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">

<HTML>

<HEAD>

  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">

  <META NAME="GENERATOR" CONTENT="GtkHTML/3.24.5">

</HEAD>

<BODY TEXT="#000000" BGCOLOR="#ffffff">

Thanks for the information. <BR>

<BR>

We tend to use iozone on new systems.&nbsp;&nbsp; Iozone has a diagnostic test method where it writes a known file of specified size various way and re-reads and compares.&nbsp;&nbsp; We experienced a really strange failure on a big Compaq server about 6 years ago where linux didn't properly handle over 4GB of RAM, and just plain forgot about certain blocks it had written to ram buffers but not to disk.&nbsp;&nbsp;&nbsp;&nbsp; Iozone was great for diagnosing that, as Compaq wanted to blame the Progress DBMS system.&nbsp;&nbsp;&nbsp;&nbsp; Pointing out that iozone failed got us out of the blame loop.&nbsp;&nbsp;&nbsp;&nbsp; Iozone is used more for benchmarking, and we've found it handy to run on a new server to see if the performance is in line with what we'd expect.&nbsp;&nbsp; <BR>

<BR>

It seems you really have to do both the low level hardware burn in as you note below, and a OS level test like Iozone to make sure the filesystem holds together. <BR>

<BR>

<BR>

<BR>

<BR>

<BR>

<BR>

<BR>

On Fri, 2012-03-02 at 14:43 -0500, Ron Frazier (ALE) wrote:<BR>

<BLOCKQUOTE TYPE=CITE>

    Hi Neal,<BR>

    <BR>

    I don't own any 1.5 TB drives, but I do have a few Seagate or Hitachi 1 TB drives that I bought within the last few years.&nbsp; They've been fine.&nbsp; I would recommend stress testing any new drive you get as follows before trusting it with data.&nbsp; I've almost always bought Seagate, and I almost always look for the 5 year warranty.<BR>

    <BR>

    Method A) Use a utility to write random data to the entire drive.&nbsp; The Ultimate Boot CD has some such things.&nbsp; Be careful not to erase your main system drive.&nbsp; Do this at least 6 times.&nbsp; This forces the drive controller to thoroughly evaluate each sector and determine if any are weak.&nbsp; It also, more or less, forces each bit on each sector to be written with different values at least a few times.<BR>

    <BR>

    Method B) Write random data to the drive once, just so it's not all zeros.&nbsp; Then, use the SpinRite utility from Gibson Research to run a level 4 surface analysis 5-6 times.&nbsp; The SpinRite utility will read each sector, invert it, write it, read it, invert it, and write it back.&nbsp; This accomplishes the same purpose as Method A), but is more thorough and predictable in that every single bit is tested both as a zero and a one.&nbsp; Using SpinRite has another advantage as outlined below.<BR>

    <BR>

    After doing A) or B), use Disk Utility in Linux or similar to run a long SMART surface test, which is read only, I think.&nbsp; This assumes the computer and drive allow you to access the SMART subsystem.&nbsp; This test should pass with no errors.&nbsp; There should also be no bad sectors reported.&nbsp; If there are bad sectors, I would consider RMAing the drive.<BR>

    <BR>

    After all this, partition and format the drive and use it for data.&nbsp; Now, I would run SpinRite 2-3 times / year on the drive.&nbsp; This is important.&nbsp; The SpinRite algorithm is non destructive.&nbsp; You can run it on a drive with data on it.&nbsp; This actually helps prevent errors by strengthening and refreshing all the magnetic domains.&nbsp; So your data is not subject to fade over time (bit rot).&nbsp; Also, it gives the controller another chance to review each sector both for reading and writing and determine if any are going bad.&nbsp; By doing these procedures, I've kept many of my drives running more than 5 years, barring any mechanical problems.&nbsp; Running a long SMART test instead of SpinRite will not accomplish the same thing.&nbsp; While it will test each sector to see if it can be read, it will not test each sector to make sure every bit can be written and read with both 0 and 1.<BR>

    <BR>

    If sectors are difficult to read, SpinRite will work as hard as possible to recover the data rather than just discarding the entire sector.&nbsp; It tries to read finicky sectors up to 2000 times, as I recall.&nbsp; Note that SpinRite works at the SECTOR level, not the file system level.&nbsp; If the file system is screwed up, reading all the data on each sector won't help, because that data is corrupt.&nbsp; While I have been known to run FSCK or CHKDISK (Windows) when having problems, it is probably better to first run SpinRite to make sure the sectors are as readable as possible from a magnetic point of view, then run FSCK or CHKDISK to correct any file system errors.<BR>

    <BR>

    Drives certainly can and do fail later in life.&nbsp; Sometimes, this exhaustive testing will expose pending problems, such as if SpinRite just cannot read some sectors, or if the SMART test reveals bad sectors.&nbsp; This will give you a chance to recover the data before the drive totally blows up.&nbsp; SpinRite has a SMART screen, but I don't put too much credence in that part of the program.&nbsp; The reason is that every manufacturer does SMART differently and they don't always publish their design docs.&nbsp; At the time he designed SpinRite, Steve had to reverse engineer the data on the SMART screen.&nbsp; It's not always set in stone.&nbsp; I'd trust the Linux SMART test in Disk Utility more for that purpose.<BR>

    <BR>

    By the way, this advice is for magnetic drives.&nbsp; Do not use on SSD's as you will probably accelerate the wear on the unit, and most of the positive benefits don't exist.&nbsp; You can use it on a hybrid SSD / magnetic drive.<BR>

    <BR>

    Sincerely,<BR>

    <BR>

    Ron<BR>

    <BR>

    <BR>

    On 3/2/2012 11:18 AM, Neal Rhodes wrote: <BR>

    <BLOCKQUOTE TYPE=CITE>

        I've gone ahead and ordered an HP core i3 system to be our next Centos home/office server. <BR>

        <BR>

        It's&nbsp; got a 1.5TB drive; normally on these off-lease units I'd buy two brand new drives and mirror them.&nbsp; Or that's what we've done with the last 3 linux servers.&nbsp;&nbsp;&nbsp;&nbsp; All of which are still technically functioning since Fedora core 1. <BR>

        <BR>

        This drive is likely about a year old, so I'm thinking I'll just buy a new 1.5TB drive and install Centos to mirror the primary. <BR>

        <BR>

        When I look at the crop of 1 - 1.5TB drives on TigerDirect and read the reviews, they seem to be uniformly terrible - DOA,&nbsp; failed after 3 weeks, replacement failed after a week, etc.&nbsp; Seagate seems to be the worst, although WD not too far behind. <BR>

        <BR>

        Ummm, isn't one of the primary selling features of a disk drive that it's not supposed to blow up and take down all your data with it?&nbsp;&nbsp;&nbsp; Has there been a massive quality slip in the last couple years since I last bought drives?&nbsp;&nbsp;&nbsp; Seriously -&nbsp; I can lose a power supply, a motherboard, a display - you name it, and once I replace it I can expect to still have the data.&nbsp;&nbsp;&nbsp; Yes, I should do backups, and I do, and yes, I should mirror the drives, and I do.&nbsp;&nbsp;&nbsp; I should do SMARTD monitoring and I do.&nbsp; But isn't this like selling tires that tend to shred randomly?&nbsp;&nbsp;&nbsp; Isn't not blowing up catastrophically with no warning beforehand a basic selling point for disk drives?&nbsp;&nbsp;&nbsp; What's the point of mirroring if the odds are good that both drives will fail completely the same week?&nbsp;&nbsp; What's the point of SMARTD monitoring if the darn drive quits without warning? <BR>

        <BR>

        Does anybody make a decent drive in that size range?&nbsp;&nbsp;&nbsp;&nbsp; <BR>

        <BR>

        I'm thinking that not even considering economy,&nbsp; my old theory of buying a pair of new identical drives may not be wise anymore, and sticking with one drive that has lasted over a year and one new drive is a better plan. <BR>

        <BR>

        Thoughts? <BR>

        <BR>

        Neal<BR>

    </BLOCKQUOTE>

    <BR>

    <BR>

    <BR>

<PRE>

-- 


(PS - If you email me and don't get a quick response, you might want to

call on the phone.  I get about 300 emails per day from alternate energy

mailing lists and such.  I don't always see new messages very quickly.)


Ron Frazier


770-205-9422 (O)   Leave a message.

linuxdude AT c3energy.com

</PRE>

</BLOCKQUOTE>

</BODY>

</HTML>