[ale] Ugh! Kernel panic when loading ext3 modules in RHAS2.1w/kernel 2.4.9-e30.smp

Jonathan Glass jonathan.glass at ibb.gatech.edu
Mon Dec 29 21:10:58 EST 2003


> On Mon, 2003-12-29 at 17:43, Dow Hurst wrote:
>> So, unless you can force the RAID card to release even when the slave
>> has crashed, you will have cut power to the slave completely, right?
>> The last operation of the slave locks the RAID card in a write mode
>> which won't let go?  Just wondering what you've found.  I am not
>> familiar with RAID cards but am looking thru the O'Reilly book on Linux
>> RAID.
>> Dow
>>
> Is this using a seperate RAID card in each server, with the shared disk
> subsystem being a JBOD (Just a Bunch Of Disks).
>
> The only raid controllers I know of that support that configuration are
> the IBM ServeRAID cards.  Is that what is in use?
>
> If so, I have considered using those cards and would be interested in
> what caused the failure.
>
> FYI: Alan Robertson on the Linux-HA mailing lists works for IBM and
> seems willing to chase down problems related to the ServeRAID cards.
> That may assume you are using the heartbeat clustering software.
>
> ie. Alan works for IBM, but I think working on Linux-HA (heartbeat) is
> his full-time job.
>
> Greg

Actually, these servers have Dell/MegaRAID cards in them, using the
megaraid_2002 drivers.  THese are connected to a Dell Powervault 220S with
the backplane configured for clustering (joint, I think), fully populated
w/13 146GB U320 SCSI drives.  Unfortunately, Dell refuses to sell me the
U320 SCSI RAID card, insisting it isn't compatible with Linux, thus I'm
stuck using the drives at 160MB/s.  BTW, the 220S has an onboard
controller which handles the RAID configuration (8 drives in a RAID 5 for
1TB of storage), and the controller cards just access the 220s's BIOS for
configuration data.

This chassis has lost 6 drives since May 2003, and at one point it took
down drive 0 on both server's internal RAID arrays.  During the last
failure, two drives went down simultaneously in my RAID 5, causing me no
end of grief and heartburn.  Dell has replaced the backplane in the 220s
(basically rebuilt the whole box), and is supposed to be replacing both
machines. Granted, they promised to replace them within 30 days the week
before Thanksgiving, and it still hasn't happened.  I'm so annoyed with
this server setup that I'm looking at IBM servers for my next rollout.

Yes, I'm using the heartbeat s/w, as I think it is bundled in the RHAS 2.1
software.

Thanks

Jonathan Glass



More information about the Ale mailing list