[ale] new (to me) raid 4/5 failure mode
Greg Freemyer
greg.freemyer at gmail.com
Mon Aug 24 09:50:37 EDT 2009
All,
If you are using raid 4 or 5 or considering it in unreliable
environments, you may want to think about this. By unreliable I mean
your system fails at unpredictable times due to power, bad hardware,
kernel crashes, etc.
Anyway, here's the failure mode:
In normal 3-disk operation: p = d1 ^ d2, thus we have redundancy.
If d1 dies the array continues to run in degraded mode:
d2 ^ p ==> d1
Now if you write data to d2, p gets updated as well: so
d2' ^ p' ==> d1 (note the primes indicate new data values)
As expected d1 can still be recreated from d2' and p'.
But in general d2 and p updates are non-atomic in relation to each
other, so their is a short period time where either:
d2' ^ p ==> garbage or
d2 ^ p' ==> garbage
So if a system or power fail occurs, d1 becomes garbage, even though
it was never written to by an application!
Obviously, I don't mean the entire disk. Just those d1 chunks that
are part of a partially updated stripe.
I had never considered that stable data sitting on a raid 5 might
change randomly, even if they were never written to. I have never
been a fan of raid 5. In fact, I only considered it a good choice for
low-end situations. I think the above rules it out for most of those.
I think raid 6 would only have a similar issue in a dual disk failure
mode, but I'm not positive.
Greg
--
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
Preservation and Forensic processing of Exchange Repositories White Paper -
<http://www.norcrossgroup.com/forms/whitepapers/tng_whitepaper_fpe.html>
The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com
More information about the Ale
mailing list