[ale] ATL Colocation and file server suggestions

Ken Ratliff forsaken at targaryen.us
Tue Jan 20 12:55:35 EST 2009


On Jan 20, 2009, at 8:58 AM, Jeff Lightner wrote:

> Seeing two mirrors drop out in RAID10 shouldn't be an issue so long as
> you have two spares because it should be able to recreate both mirrors
> as it is both mirrored and parity striped.   (Losing two in RAID1  
> would
> of course be an issue [assuming you didn't have a 3rd mirror]).   What
> hardware array was being used that dropping two mirrors caused a  
> problem
> for RAID10?

Maybe I didn't say that clearly enough - I've seen a RAID 10 lose both  
drives of a single mirror (in other words, the stripe set was broken,  
total data loss, chaos, screaming, and finally a restore from backup!)  
multiple times. And yes, I felt like God hated me, along with a few  
other supernatural entities.

> I'll have to say that even if having two mirrors drop in RAID10 were a
> problem it must be a sign that God hates the person that had it if the
> two drives that failed (assuming it was only two) happened to be the
> ones that were mirrors of each other.  We've been using RAID10 for the
> last 4 years since our last major RAID5 fiasco and have not seen  
> such a
> failure.

Hehe, I've noticed that RAID level preference seems to be alot like OS  
preference. You are shaped by your experiences and react based on  
them. I don't trust RAID 10 (though I'll use it when I can't use  
RAID5, ie for a database server) and I've never had that much of a  
problem with RAID5. The closest thing was a 6TB array in which a drive  
died and another one was showing DRIVE-ERROR, but it made it through  
the first rebuild to replace the dead drive, and then through the  
second to replace the one showing error like a champ.

If I had my way, I'd yank the drives out of the servers entirely, just  
build a nice sexy SAN and export disks to the servers via iSCSI. But  
that's.... expensive.

> -----Original Message-----
> From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of  
> Pat
> Regan
> Sent: Tuesday, January 20, 2009 2:56 AM
> To: ale at ale.org
> Subject: Re: [ale] ATL Colocation and file server suggestions
>
> Ken Ratliff wrote:
>>> However, software RAID 1, 10 is excellent and performance compatible
>>> with a hardware card.
>>
>> I still prefer to do RAID 10 on hardware. I've found software raid to
>
>> be pretty finicky, drives dropping out of the array for no good
>> reason, and you don't notice it until the slight performance hit for
>> the rebuild makes you go 'hm.'
>
> If drives are randomly dropping out of Linux software RAID something  
> is
> wrong.  I can only recall having two machine that had random drives
> dropping out of MD devices.  Both showed errors when running  
> memtest86.
> One was a bad CPU, I believe the other was bad RAM (IIRC).
>
>> I actually don't like RAID 10 at all. I'd rather toss the 4 drives
>> into a RAID5 and get more space. Sure, a RAID 10 will allow you to
>> survive 2 dead drives, as long as it's the right 2 drives. I've seen
>> both drives of one mirror fail in a RAID 10 a few times, and that has
>
>> pretty much the same result as 2 dead drives in a RAID 5.
>
> Redundancy isn't the first reason to choose RAID 10 over RAID 5.  If  
> it
> were, everyone would just choose RAID 6 since that would let you lose
> any two drives.
>
> RAID 5 has a terribly write performance problem.  Doing a random
> uncached write to a RAID 5 involves a read and a write to one stripe  
> on
> every drive.
>
>> Software RAID1 I have no problem with though. It's quick, easy, the
>> performance hit is negligible unless you have something that's really
>
>> pounding the disk i/o and as someone else mentioned, being able to
>> split the mirror and use them as fully functional drives does
>> occasionally have it's uses.
>
> Hardware RAID 1 shouldn't have a write performance penalty.  Software
> RAID 1 (or 10) requires double the bus bandwidth for writes.  I can't
> speak for all implementations, but Linux MD RAID 1 spreads reads out
> over all drives in the raid set.
>
>> Yeah, we found out the hard way that software RAID5 is a very very  
>> bad
>
>> idea, especially if you're running it on a high activity web server.
>> After enough times of having a drive in software raid5 die before
>> you're done rebuilding from the previous drive failure, you kind of
>> learn that maybe this isn't such a good idea (or you tell the night
>> crew to turn apache off so that the array can rebuild in peace, but
>> that's not something properly spoken of in public!). The performance
>> hit for software RAID5 just isn't worth implementing it.
>
> Your slow rebuilds likely had nothing to do with the performance of
> software RAID 5.  I would imagine you needed to tweak
> '/proc/sys/dev/raid/speed_limit_min' up from the default of 1MB/sec.
>
> There is very little reason for a hardware controller to beat Linux MD
> at RAID 5, especially on modern hardware.  It only requires one more
> drive worth of bus bandwidth than a hardware controller would require.
> Processors have always been able to compute parity faster than current
> hardware cards.  dmesg on my laptop tells me that I can compute RAID 6
> parity at 2870 megabytes per second.
>
> I am not saying software RAID is for everyone.  It has other  
> advantages
> besides cost, but if you have a large budget those advantages aren't  
> as
> useful :).
>
>> Now with that being said, no form of RAID is truly safe. I had a
>> server today drop both drives in one of it's RAID1's. They were older
>
>> 36 gig SCSI's, so it was about time anyway, but losing both of them
>> meant I got to spend time flattening the box and reinstalling it.  
>> This
>
>> is also why I try to avoid using drives from the same manufacturer  
>> and
>
>> batch when building arrays, as well. If you don't, you better pray to
>
>> god that the rebuild completes before the next one dies. It's said
>> that RAID is no substitute for a proper backup, and that's true. (And
>
>> my life being somewhat of an essay in irony, the box that dropped  
>> both
>
>> drives in the mirror today was being used as a backup server.)
>
> This paragraph reminded me of three(!) things.  RAID is not a backup.
> You know this, but lots of people don't.
>
> Second, have you ever had the annoyance of replacing a failed drive  
> with
> another of the same make/model and the replacement drive is in fact
> smaller than the failed drive?  Whenever I use software RAID I  
> sacrifice
> a few percent off the end of the drive just to keep this from  
> happening.
>
> Which reminds me of another aspect of software RAID that has been
> helpful for me in the past.  Software RAID is managed at the partition
> level.  On smaller boxes I've often set up a relatively small RAID 1  
> or
> 10 at the front of the drives and RAID 5 on the rest.
>
> My home media server is set up like this, and so is my person Xen host
> that I have in a colo.  I'm very budget conscious when I spend my own
> dollars.  The Xen server has an uptime of 300 days right now with no
> RAID failures :).
>
>> (Also, I'm not preaching at you, Jim, I'm sure you know all this  
>> crap,
>
>> I'm just making conversation!)
>
> I like conversation, and I could just as easily be on your side of  
> this
> one.  :)
>
>>> RAID 1 recovery is substantially quicker and drives
>>> are low cost enough to not need the N-1 space of RAID 5.
>>
>> All depends on your storage needs. We have customers with 4 TB  
>> arrays,
>
>> 6 TB arrays, and one with an 8.1 TB array (which presents some
>> interesting challenges when you need to fsck the volume.... why we
>> used reiser for the filesystem on that array, I have no idea). Those
>> are a little hard to do in RAID 1 :)
>
> I'm up near 4 TB at home.  That isn't even a big number anymore! :)
>
> I had 4 TB back in 2001.  That was mighty expensive, though, and it  
> sure
> wasn't a single volume.  I only mention this so you don't just think  
> I'm
> some punk with a RAID 5 on his TV blowing smoke :)
>
> Pat
> ----------------------------------
> CONFIDENTIALITY NOTICE: This e-mail may contain privileged or  
> confidential information and is for the sole use of the intended  
> recipient(s). If you are not the intended recipient, any disclosure,  
> copying, distribution, or use of the contents of this information is  
> prohibited and may be unlawful. If you have received this electronic  
> transmission in error, please reply immediately to the sender that  
> you have received the message in error, and delete it. Thank you.
> ----------------------------------
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20090120/79c84c24/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 194 bytes
Desc: This is a digitally signed message part
Url : http://mail.ale.org/pipermail/ale/attachments/20090120/79c84c24/attachment.bin 


More information about the Ale mailing list