[ale] Something I thought I'd never see

Danny Cox DCox at icc.net
Wed Oct 11 09:56:42 EDT 2006


Jeff,

On Wed, 2006-10-11 at 09:31 -0400, Jeff Lightner wrote:
<snip>

> However I?m wondering how I might have figured this out if I hadn?t
> been able to narrow down the day except by running ps ?ef and looking
> for oddities such as the ones I found?   This prompted the question
> above.   I often see what appear to me to be abnormally high load
> averages (as compared to what I?d think reasonable on the UNIX boxes
> I?ve worked on) but they don?t seem to actually impact performance
> overall.   

	With a "ps ef" you'll continually see processes stuck in 'D' state.
Usually, you'll only be able to catch one or two in that state, and the
next time you run ps, they'll be 'R'unning or 'S'leeping.  

	'D' is described as a "short sleep".  It's present during the time the
kernel is running on behalf of the process doing disk I/O.  That's
usually much less than a second.

	So, if you're continually seeing processes stuck in 'D' state, that's
probably filesystem corruption, or a disk slowly dieing.

	You can do an ls -l on /proc/<pid>/fd to see what files it has open.
One of those will be the problem child.  You can then determine the
filesystem in question.

	You might also try using strace -p <pid> to trace the process.  It may
give the system call it's currently trying to use.  If it does, the
first argument in a read or write is the fd.  Then use
the /proc/<pid>/fd/<fd> to determine the filesystem in question.

	Good luck!

-- 
Daniel S. Cox
Internet Commerce Corporation





More information about the Ale mailing list