[ale] ATL Colocation and file server suggestions

Pat Regan thehead at patshead.com
Tue Jan 20 02:56:17 EST 2009


Ken Ratliff wrote:
>> However, software RAID 1, 10 is excellent and performance compatible
>> with a hardware card.
> 
> I still prefer to do RAID 10 on hardware. I've found software raid to  
> be pretty finicky, drives dropping out of the array for no good  
> reason, and you don't notice it until the slight performance hit for  
> the rebuild makes you go 'hm.'

If drives are randomly dropping out of Linux software RAID something is
wrong.  I can only recall having two machine that had random drives
dropping out of MD devices.  Both showed errors when running memtest86.
 One was a bad CPU, I believe the other was bad RAM (IIRC).

> I actually don't like RAID 10 at all. I'd rather toss the 4 drives  
> into a RAID5 and get more space. Sure, a RAID 10 will allow you to  
> survive 2 dead drives, as long as it's the right 2 drives. I've seen  
> both drives of one mirror fail in a RAID 10 a few times, and that has  
> pretty much the same result as 2 dead drives in a RAID 5.

Redundancy isn't the first reason to choose RAID 10 over RAID 5.  If it
were, everyone would just choose RAID 6 since that would let you lose
any two drives.

RAID 5 has a terribly write performance problem.  Doing a random
uncached write to a RAID 5 involves a read and a write to one stripe on
every drive.

> Software RAID1 I have no problem with though. It's quick, easy, the  
> performance hit is negligible unless you have something that's really  
> pounding the disk i/o and as someone else mentioned, being able to  
> split the mirror and use them as fully functional drives does  
> occasionally have it's uses.

Hardware RAID 1 shouldn't have a write performance penalty.  Software
RAID 1 (or 10) requires double the bus bandwidth for writes.  I can't
speak for all implementations, but Linux MD RAID 1 spreads reads out
over all drives in the raid set.

> Yeah, we found out the hard way that software RAID5 is a very very bad  
> idea, especially if you're running it on a high activity web server.  
> After enough times of having a drive in software raid5 die before  
> you're done rebuilding from the previous drive failure, you kind of  
> learn that maybe this isn't such a good idea (or you tell the night  
> crew to turn apache off so that the array can rebuild in peace, but  
> that's not something properly spoken of in public!). The performance  
> hit for software RAID5 just isn't worth implementing it.

Your slow rebuilds likely had nothing to do with the performance of
software RAID 5.  I would imagine you needed to tweak
'/proc/sys/dev/raid/speed_limit_min' up from the default of 1MB/sec.

There is very little reason for a hardware controller to beat Linux MD
at RAID 5, especially on modern hardware.  It only requires one more
drive worth of bus bandwidth than a hardware controller would require.
Processors have always been able to compute parity faster than current
hardware cards.  dmesg on my laptop tells me that I can compute RAID 6
parity at 2870 megabytes per second.

I am not saying software RAID is for everyone.  It has other advantages
besides cost, but if you have a large budget those advantages aren't as
useful :).

> Now with that being said, no form of RAID is truly safe. I had a  
> server today drop both drives in one of it's RAID1's. They were older  
> 36 gig SCSI's, so it was about time anyway, but losing both of them  
> meant I got to spend time flattening the box and reinstalling it. This  
> is also why I try to avoid using drives from the same manufacturer and  
> batch when building arrays, as well. If you don't, you better pray to  
> god that the rebuild completes before the next one dies. It's said  
> that RAID is no substitute for a proper backup, and that's true. (And  
> my life being somewhat of an essay in irony, the box that dropped both  
> drives in the mirror today was being used as a backup server.)

This paragraph reminded me of three(!) things.  RAID is not a backup.
You know this, but lots of people don't.

Second, have you ever had the annoyance of replacing a failed drive with
another of the same make/model and the replacement drive is in fact
smaller than the failed drive?  Whenever I use software RAID I sacrifice
a few percent off the end of the drive just to keep this from happening.

Which reminds me of another aspect of software RAID that has been
helpful for me in the past.  Software RAID is managed at the partition
level.  On smaller boxes I've often set up a relatively small RAID 1 or
10 at the front of the drives and RAID 5 on the rest.

My home media server is set up like this, and so is my person Xen host
that I have in a colo.  I'm very budget conscious when I spend my own
dollars.  The Xen server has an uptime of 300 days right now with no
RAID failures :).

> (Also, I'm not preaching at you, Jim, I'm sure you know all this crap,  
> I'm just making conversation!)

I like conversation, and I could just as easily be on your side of this
one.  :)

>> RAID 1 recovery is substantially quicker and drives
>> are low cost enough to not need the N-1 space of RAID 5.
> 
> All depends on your storage needs. We have customers with 4 TB arrays,  
> 6 TB arrays, and one with an 8.1 TB array (which presents some  
> interesting challenges when you need to fsck the volume.... why we  
> used reiser for the filesystem on that array, I have no idea). Those  
> are a little hard to do in RAID 1 :)

I'm up near 4 TB at home.  That isn't even a big number anymore! :)

I had 4 TB back in 2001.  That was mighty expensive, though, and it sure
wasn't a single volume.  I only mention this so you don't just think I'm
some punk with a RAID 5 on his TV blowing smoke :)

Pat

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
Url : http://mail.ale.org/pipermail/ale/attachments/20090120/10556018/attachment.bin 


More information about the Ale mailing list