[ale] new (to me) raid 4/5 failure mode

Mon Aug 24 16:31:47 EDT 2009

On Mon, Aug 24, 2009 at 3:55 PM, Pat Regan<thehead at patshead.com> wrote:
> Greg Freemyer wrote:
>>> If your application isn't calling fsync then the data must not be that
>>> important :).
>>
>> fsync does not address this issue, which is a small number of
>> milliseconds of vulnerability for mos. disk write.
>
> If fsync doesn't address the issue there is either a hardware or driver
> problem.  fsync is not supposed to return until the disk hardware has
> flushed the write to the platter.

If I have in c-code:  write(); fsync();

That clearly takes some number of milliseconds.  This failure mode
requires a unexpected failure during that time period.  fsync may
shorten the window.  It is not capable of eliminating it.

> I've heard rumors of inexpensive disks cheating on this.  I have no
> proof of this and I'm not convinced it is true :).
>
>>> d1 was garbage the moment it dropped out of the array :).
>>
>> But it is recreatable from d2 and p, or from d2' and p'.
>>
>>
>> It is not recreatable from d2 and p'  or from d2' and p.    And if you
>> have system shutdown during that window of vulnerability, that is all
>> you will have to work with.
>
> If all your hardware and drivers work correctly this state can never be
> reached for data that has been fsynced.

agreed, but again, fsync takes real clock time.  It is during that
time period you are vulnerable.

> Now, for data that has not been fsync'ed it is an entirely different story.

Every piece of data is not fsynced  for a minimum time period.  If it
is 5 milliseocnds, then that is your window.
>
>>> Once a disk drops out of an array I would expect all data on the drive
>>> to bad.
>>
>> But you expect it to be recreatable from the other drives in the
>> raidset.  Or at least I do.
>
> I do.  I fully expect my disks and controllers to honor any fsync calls.
>  You can't have any true atomicity without a fully working fsync.

fsync is NOT atomic.  With 2 real physical disks you do not have the
ability to synchronize  the writes such that parity and data are
updated at exactly the same time.  There will always be a millisecond
or two of vulnerability.

>> Raid 5 fully operational will not cause data on one drive to be lost
>> because a write is going on to another drive and you have an
>> unexpected shutdown.
>>
>
> The array might be clean, but your data won't be.  You need at least n-1
> disks in a stripe to be in sync for a stripe to be valid.  If you have 4
> disks and only 2 of them were flushed before power went out you will
> have an unrecoverable stripe.
>
> Its very similar to losing power on a single disk mid-write.
>
Exactly, but this failure mode is NOT like that.

This failure mode is causing data that is not part of the write
process to be lost!

ie. Assume I am writing to LBA n, and because of a power outage LBA
n+64 is lost.  I don't think that can happen on a single drive setup.
It is exactly what happens in the failure mode.

Since you talk about a video server below, with a normal failure mode,
if you are recording "Lord of the Rings" when you lose power, then
"Lord of the Rings" is corrupt and and you have to re-record it.  We
all expect that.

With this failure mode, you would not only lose Lord of the Rings, but
Star Wars might be on the same stripe and get some of its data blocks
corrupted.

Greg

> I think all I'm saying is that there isn't a terribly huge difference in
> data loss between a single disk or any level of RAID during power loss.
>  No matter what you may have data in a half written state.
>
>> Yeah, like I said, I'm not a raid 5 fan at all.  raid 6 I think has
>> its place, but probably on in arrays with lots of drives so that you
>> can do more work in parallel.  I've pretty much given up on raid 5,
>> and this failure mode is just one more nail in the coffin for me.
>
> It doesn't matter how parallel your work load.  RAID 5 and 6 will always
> be slow for writes.
>
> Also, the failure mode you are worried about can happen with RAID 6 as
> well.  You don't even need a failed drive.  In a single disk power
> failure you lose what is in cache.  In a RAID 5 you can potentially lose
> any stripe that is in cache in 2 or more drive.  RAID 6 would be just
> like 5 except add one drive.
>
> I notice this a lot on one of my home media server machine.  I've been
> adding disks to it for a while.  Root/boot started on a small RAID 1 on
> the first 8 gig of each disk, the rest of the disk is part of a big RAID
> 5.  As I've added disks, the easiest thing to do was to add more mirrors
> to the RAID 1.
>
> There's 6 disks in the machine, but the root/boot is "only" mirrored 4
> times.  Sometimes if there is a power blip or the machine locks up (I
> had a cpu problem a while back), one of the mirrors in the 4-way RAID 1
> will be inconsistent.  It seems to be enough to confuse MD and force me
> to manually drop one of the mirrors or the MD won't start back up.
>
> If I knew I was going to have this conversation I may have paid
> attention to see if 2 disks ever go missing at the same time.  I want to
> say this has happened at least once, but I'm not entirely certain.
>
>>> Pat
>>
>> Greg
>>
>
> Pat
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
>
>

-- 
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
Preservation and Forensic processing of Exchange Repositories White Paper -
<http://www.norcrossgroup.com/forms/whitepapers/tng_whitepaper_fpe.html>

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com