[ale] Hardware diagnostics

Greg Freemyer greg.freemyer at gmail.com
Thu Jan 6 14:02:32 EST 2005


I have a machine that is introducing occasional data corruption.

For instance, I just copied 200 GB between two drives (one 3ware
raid0, one 250GB PATA).

I then verified the data was the same using cmp --verbose.

It found 2 one-byte differences, so I'm having 1 byte/100 GB of corruption. 

Assuming 'cmp --verbose' outputs octal, both bytes had a single bit
set in the copy that was not set in the original.

i.e. 20 --> 220   and   0 --> 200

I know I need to run a memchk, but are there any other diagnostics I
could run to try to figure out what hardware is bad?

I'm also wondering if I need to byte the bullet and replace the
motherboard with one that has ECC RAM.

FYI: I'm pretty sure it is not the 3ware card (I've had corruption
when none of the data was on disks controlled by that card.)  I've
also already changed out the ATA controller.

Thanks
Greg
-- 
Greg Freemyer



More information about the Ale mailing list