<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>


<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

</head>

<body bgcolor="#ffffff" text="#000000">

I have a graphic design client with a 2U server running Fedora 11 and

now 12 which is at a colo handling their backups.&nbsp; The server has 8

drives with Linux md raids &amp; LVM on top of them.&nbsp; The primary

filesystems are ext4 and there is/was an LVM swap space.<br>

<br>

I've had an absolutely awful experience with these Seagate 1.5 TB

drives, returning 10 out of the original 14 due to the ever increasing

SMART "Reallocated_Sector_Ct" due to bad blocks.&nbsp; The server that the

client has at their office has a 3ware 9650(I think) that has done a

great job of handling the bad blocks from this same batch of drives and

sending email notifications of one of the drives that grew more and

more bad blocks.&nbsp; This 2U though is obviously pure software raid, and

it has started locking up.<br>

<br>

As a stabilizing measure, I've disable the swap space, hoping the

lockups were caused by failure to read/write from swap.&nbsp; I have yet to

let the server run over time and assess if this was successful.<br>

<br>

However, I'm doing a lot of reading today on how md &amp; LVM handle

bad blocks and I'm really shocked.&nbsp; I found&nbsp;<a

 href="http://linas.org/linux/raid.html">this article</a> (which may be

outdated) which claimed that md relies heavily on the firmware of the

disk to handle these problems and when rebuilding an array there are no

"common sense" integrity checks to assure that the right data is

reincorporated back into the healthy array.&nbsp; Then I've read more and

more articles about drives that were silently corrupting data.&nbsp; It's

turned my stomach.&nbsp; Btrfs isn't ready for a this, even though RAID5 was

very recently incorporated.&nbsp; And I don't see btrfs becoming a

production stable file system until 2011 at the earliest.<br>

<br>

Am I totally wrong about suspecting bad blocks for causing the

lock-ups?&nbsp; (syslog records nothing)<br>

Can md RAID be trusted with flaky drives?<br>

If it's the drives, then other than installing OpenSolaris and ZFS, how

to I make this server reliable?<br>

Any experiences with defeating mysterious lock-ups?<br>

<br>

Thanks!<br>

<br>

------------------------------SMART Data-----------------------------<br>

<font face="Courier New, Courier, monospace">[root@victory3 ~]# for

letter in a b c d e f g h ; do echo /dev/sd$letter; smartctl --all

/dev/sd$letter |grep Reallocated_Sector_Ct; done<br>

/dev/sda<br>

&nbsp; 5 Reallocated_Sector_Ct&nbsp;&nbsp; 0x0033&nbsp;&nbsp; 100&nbsp;&nbsp; 100&nbsp;&nbsp; 036&nbsp;&nbsp;&nbsp; Pre-fail&nbsp;

Always&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 8<br>

/dev/sdb<br>

&nbsp; 5 Reallocated_Sector_Ct&nbsp;&nbsp; 0x0033&nbsp;&nbsp; 100&nbsp;&nbsp; 100&nbsp;&nbsp; 036&nbsp;&nbsp;&nbsp; Pre-fail&nbsp;

Always&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1<br>

/dev/sdc<br>

&nbsp; 5 Reallocated_Sector_Ct&nbsp;&nbsp; 0x0033&nbsp;&nbsp; 100&nbsp;&nbsp; 100&nbsp;&nbsp; 036&nbsp;&nbsp;&nbsp; Pre-fail&nbsp;

Always&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0<br>

/dev/sdd<br>

&nbsp; 5 Reallocated_Sector_Ct&nbsp;&nbsp; 0x0033&nbsp;&nbsp; 100&nbsp;&nbsp; 100&nbsp;&nbsp; 036&nbsp;&nbsp;&nbsp; Pre-fail&nbsp;

Always&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0<br>

/dev/sde<br>

&nbsp; 5 Reallocated_Sector_Ct&nbsp;&nbsp; 0x0033&nbsp;&nbsp; 100&nbsp;&nbsp; 100&nbsp;&nbsp; 036&nbsp;&nbsp;&nbsp; Pre-fail&nbsp;

Always&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1<br>

/dev/sdf<br>

&nbsp; 5 Reallocated_Sector_Ct&nbsp;&nbsp; 0x0033&nbsp;&nbsp; 100&nbsp;&nbsp; 100&nbsp;&nbsp; 036&nbsp;&nbsp;&nbsp; Pre-fail&nbsp;

Always&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0<br>

/dev/sdg<br>

&nbsp; 5 Reallocated_Sector_Ct&nbsp;&nbsp; 0x0033&nbsp;&nbsp; 100&nbsp;&nbsp; 100&nbsp;&nbsp; 036&nbsp;&nbsp;&nbsp; Pre-fail&nbsp;

Always&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1<br>

/dev/sdh<br>

&nbsp; 5 Reallocated_Sector_Ct&nbsp;&nbsp; 0x0033&nbsp;&nbsp; 100&nbsp;&nbsp; 100&nbsp;&nbsp; 036&nbsp;&nbsp;&nbsp; Pre-fail&nbsp;

Always&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0</font><br>

</body>

</html>