[ale] RAID 5 - OK, now it's hardware

Gregory C. Johnsom mailreply at GregJohnson.Com
Mon Apr 11 09:52:57 EDT 2005


Kieth,

Thanks for the reply...  Naturally, I forgot to attach the 
dmesg/mdstat/mdadm dumps I did to my first mail -  they are in an 
immediately subsequent mail.

I'm reading the md source, but I'm still not sure what, if anything,  to 
do.  I don't/didn't hear a lot of activity from the box, it's been going 
for nearly 3 days, and how long could/should 500GB take to sync anyway?  
I can't see any indication of any kind of progresss indicator.  
(Naturally the debugging macro that would detail this info is turned off).

I'm not getting resets at the moment, so who knows -  Most likely 
nothing is happening  so nothing is generating errors.

<Insert several hours and a mail list bounce here>

OK,  bit the bullet and did a (IIRC) /mdadm --manage --run --force/ , 
which seems to have cleared things up a bit, particularly in that 
--detail now shows a progress indicator.

Waited for sync to complete & ran a jacksum against all the LVM volumes, 
which generates the plethora of resets mentioned in tehe original 
miail...  jacksum's output was piped to an SMB mount, so (almost) all 
the local system's activity should have been reads.

Does anyone have any idea what to do about the following errors, or know 
where I should look next?

=============== lspci =================
<Command hangs>, but it's an nForce3 board with 2xPromise 100tx2 ide 
controllers + a Marox G400 & a video capture card awaiting identification.

=============== dmesg =================

 dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdl: dma_timer_expiry: dma status == 0x44
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdl: dma_timer_expiry: dma status == 0x44
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdl: dma_timer_expiry: dma status == 0x44
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdl: dma_timer_expiry: dma status == 0x44
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdl: dma_timer_expiry: dma status == 0x44
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdl: dma_timer_expiry: dma status == 0x44
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
eth0: no IPv6 routers present
eth0: no IPv6 routers present
hdj: dma_timer_expiry: dma status == 0x44
nfs: server 192.168.10.101 not responding, still trying
nfs: server 192.168.10.101 not responding, still trying
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
nfs: server 192.168.10.101 not responding, still trying
nfs: server 192.168.10.101 not responding, still trying
hdl: dma_timer_expiry: dma status == 0x44
nfs: server 192.168.10.101 not responding, still trying
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
hdj: dma_timer_expiry: dma status == 0x44
PDC202XX: Primary channel reset.
hdj: DMA interrupt recovery
hdj: lost interrupt
SMB connection re-established (-5)
hdl: dma_timer_expiry: dma status == 0x44
PDC202XX: Secondary channel reset.
hdl: DMA interrupt recovery
hdl: lost interrupt

Keith Hopkins wrote:

> Hi Greg,
>
>   After recently recovering from a multiple drive failure on a Raid5 
> setup, I think I can say 'its (probably) not toast'.
>
>   First, see which drive you are getting the resets on.  If you are 
> getting resets on multiple drives, do some isolation troubleshooting 
> and consider replacing the cables or even the controller.  Load up a 
> unused drive, and use ddrescue to copy as much data as possible from 
> the resetting drive to the unused drive.  Now, swap your 'fresh' drive 
> with the one which was resetting, being sure to remove the 'resetting' 
> drive completely from the system, and try to `assemble` the the array 
> again.  There is no point in trying to recover your system on flaky 
> hardware.
>
>   MD status is in /proc/mdstat.  Just `cat` it.
>
> --Keith
>
>
> Gregory C. Johnsom wrote:
>
>> Hello world,
>>
>> Cutting to the chase, I have a RAID 5 array created with a "missing" 
>> drive and a RAID0 assemblage as placeholders for where the data 
>> source drives will go.
>>
>> I was not able to set up the RAID atomically, and since starting the 
>> process suffered a fried PS.  On the new box, I set up the the RAID 
>> using the script I originally developed (creatively named "bootRaid") 
>> and tried to  finish the process.  It did not go well, and dmesg 
>> started showing a lot of channel resets. (If memory serves).  I 
>> dropped the box, and upon reboot md0 (the big RAID5) refused to 
>> start.  I've waited 2-3 days for the sync to complete, and it has 
>> not.  Just before the last operation the array showed several hundred 
>> gig of data on it, so it's worth salvaging if I can.
>>
>> During this time, I've tried to find anything resembling current 
>> guidance on the md drivers and recovery thereof.  I have yet to find 
>> anything that will indicate whetehr a re-sync is actually in progress 
>> and if so where it stands.
>>
>> The system has no OS yet, so I'm running Knoppix3.7 using 2.6 for the 
>> EVMS (LVM2?) support.
>>
>> Is a dirty, degraded raid5 array toast?
>> If yes, I assume this is why most controllers emphasize RAID 01/10.  
>> I favor 01 in this scenario.  What think you?
>>
>> If no, how can I get this thing back up and avoid wasting days on 
>> "high availability" again?
>>
>> Thanks,
>> -Greg
>>
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> http://www.ale.org/mailman/listinfo/ale
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Ale mailing list
>Ale at ale.org
>http://www.ale.org/mailman/listinfo/ale
>




More information about the Ale mailing list