[ale] new (to me) raid 4/5 failure mode

Mon Aug 24 10:54:08 EDT 2009

Greg Freemyer wrote:
> If you are using raid 4 or 5 or considering it in unreliable
> environments, you may want to think about this.  By unreliable I mean
> your system fails at unpredictable times due to power, bad hardware,
> kernel crashes, etc.

RAID helps protect your data from disk failures and not much else.  You
can increase your reliability with a UPS, a battery backed up RAID
controller, and multiple disk controllers.

> But in general d2 and p updates are non-atomic in relation to each
> other, so their is a short period time where either:
> 
> d2' ^ p ==> garbage or
> d2 ^ p' ==> garbage

If your application is calling an fsync the disks in the array should
never be in this state.  If they are, something is probably wrong.  An
fsync call isn't supposed to return until the data is actually on the
platter.

If your application isn't calling fsync then the data must not be that
important :).

> So if a system or power fail occurs, d1 becomes garbage, even though
> it was never written to by an application!

d1 was garbage the moment it dropped out of the array :).

> Obviously, I don't mean the entire disk.  Just those d1 chunks that
> are part of a partially updated stripe.

Once a disk drops out of an array I would expect all data on the drive
to bad.

> I had never considered that stable data sitting on a raid 5 might
> change randomly, even if they were never written to.  I have never
> been a fan of raid 5.  In fact, I only considered it a good choice for
> low-end situations.  I think the above rules it out for most of those.

There's nothing special about RAID 5 in this regard.  Any time you have
multiple disks they can get out of sync during loss of power.

Silent data corruption is the reason ZFS and other new filesystems like
btrfs write checksums for every block to the disk.  Read up on Sun's
testing regarding single bit errors per TB of data...  It's pretty scary.

> I think raid 6 would only have a similar issue in a dual disk failure
> mode, but I'm not positive.

RAID 5 and 6 are both really only suitable for situations that are
mostly read intensive.  They both suffer from the same "write hole"
problem.  A cache miss on a write requires a read of 1 stripe from every
drive, a parity computation, and a write of one stripe to every drive.
All writes require a hit to every disk.

Sequential writes aren't so bad, random writes are a killer :).

Pat

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
Url : http://mail.ale.org/pipermail/ale/attachments/20090824/de2d1256/attachment.bin