[ale] diagnosis
James P. Kinney III
jkinney at localnetsolutions.com
Sun Apr 25 19:26:14 EDT 2004
On Sun, 2004-04-25 at 18:28, David Corbin wrote:
> On Saturday 24 April 2004 11:13, James P. Kinney III wrote:
> > > The "investigation" I ran yesterday *was* in single user mode. And to
> > > keep things fresh in your memory, as soon as the /var/run/utmp file
> > > exists (even in single user mode), memory starts disappearing from free
> > > to be used by buffers. If that file is not there (when I mount /var) I
> > > do not see evidence of the memory leak. I've never let it exhaust memory
> > > while in single user mode, but at run-level 2 (normal) it eventually runs
> > > out of memory to allocate. I wouldn't really says the system crashes,
> > > but none of the applicatoins can operate as no RAM is available for them.
> >
> > Well, utmp is a storage area for logins and usage info. It that file is
> > growing in single user mode with nothing else running, you have a
> > problem. The kernel should be what is generating the data for the utmp
> > file. Since the presence of utmp initiates the memory loss, I would
> > suspect that kernel is corrupted and is not flushing the write to utmp
> > and is instead buffering the write process and/or data. This may
> > indicate a bad hard drive, trojaned kernel or failing RAM.
> >
>
> I'm reasonably sure it's not a trojaned kernel - building a new kernel from
> another machine was one of the first tests (though I didn't put it on a CD,
> but installed it on the hard drive, I admit...)
Don't build a kernel on another machine and move it. Use a kernel from a
distro CD that you _know_ is clean.
>
> > Run memtest and rule out that. Then copy a kernel from a CD distribution
> > and set lilo/grub to use that kernel. Then boot to single user, touch
> > utmp, reboot back to single user with the same CD kernel and watch the
> > top process. If there is still the problem, drop in an other hard drive,
> > make it the /var partition, and try again.
> >
>
> When you say "memtest", you're referring to the shell-script that does lots of
> tarring/untarring?
Memtest is the bootable check the ram every which way to Tuesday test
that is used to stress the ram and find any weird errors.
>
> > If all that fails, get a Geiger counter and start looking for a
> > radiation source that can cause bit flips :)
> >
> > > > On Fri, 2004-04-23 at 17:37, David Corbin wrote:
> > > > > I tried it with the "safe" version of top. It shows nothing that
> > > > > isn't in my regular top. However, I did try "vmstat" which was
> > > > > there. It shows that the free memory is disappear as the "buffers"
> > > > > is growing.
> > > > >
> > > > > Does that help any?
> > > > >
> > > > > On Monday 19 April 2004 20:35, James P. Kinney III wrote:
> > > > > > I put up a page with the binaries and source on it :
> > > > > >
> > > > > > http://www.localnetsolutions.com/tools/
> > > > > >
> > > > > > Note: the procps page on sourceforge did not have an md5 checksum.
> > > > > >
> > > > > > On Mon, 2004-04-19 at 20:02, David Corbin wrote:
> > > > > > > On Monday 19 April 2004 15:01, James P. Kinney III wrote:
> > > > > > > > If it is a cracked machine, running a statically linked top
> > > > > > > > from a CD will gain access to the real top data. Top is a
> > > > > > > > common binary to fiddle with with a root kit.
> > > > > > >
> > > > > > > Sounds reasonable. Can you point me at such, or if not that,
> > > > > > > anybody got any idea where the source to top is and I'll build my
> > > > > > > own.
> > > > > > >
> > > > > > > > It is certainly possible to _add_ a module or _remove_ a
> > > > > > > > module, but change out the kernel with out a reboot (unless
> > > > > > > > 2-kernel-monte is available, I have not been able to find this
> > > > > > > > :( ). So the actual data stream for top is not tamper-able
> > > > > > > > easily. Thus a known good statically-linked top would give
> > > > > > > > access to the running system and show the _real_ processes that
> > > > > > > > are running.
> > > > > > > >
> > > > > > > > If top shows no malicious files, it's time to take some
> > > > > > > > snapshots over time to plot which app is failing.
> > > > > > > >
> > > > > > > > #!/bin/sh
> > > > > > > > echo date >> /tmp/top.txt
> > > > > > > > top -b -n 1 -c >> /tmp/top.txt
> > > > > > > > echo "###############" >>/tmp/top.txt
> > > > > > > > echo >>/tmp/top.txt
> > > > > > > > echo >>/tmp/top.txt
> > > > > > > >
> > > > > > > > Run as a cron every minute for an hour.
> > > > > > > >
> > > > > > > > If you want, you can now mash/mangle the data into a nice plot
> > > > > > > > using some perl and gnplot (or a spreadsheet).
> > > > > > > >
> > > > > > > > On Mon, 2004-04-19 at 11:56, Geoffrey wrote:
> > > > > > > > > Dow Hurst wrote:
> > > > > > > > > > How can we find the process that is soaking the memory?
> > > > > > > > > > How do you manipulate /proc to find out the originating
> > > > > > > > > > process that owns the memory being used? I know IRIX had
> > > > > > > > > > tools to look at memory and see which processes owned what
> > > > > > > > > > part of memory. Does Linux?
> > > > > > > > > >
> > > > > > > > > > Seems if you knew what was leaking you would have a major
> > > > > > > > > > part of the battle won.
> > > > > > > > >
> > > > > > > > > I believe we mentioned top, but he noted that doesn't give
> > > > > > > > > him anything. That's what concerns me. If it doesn't show,
> > > > > > > > > is it being hidden for a reason???
> > > > >
> > > > > _______________________________________________
> > > > > Ale mailing list
> > > > > Ale at ale.org
> > > > > http://www.ale.org/mailman/listinfo/ale
> > >
> > > _______________________________________________
> > > Ale mailing list
> > > Ale at ale.org
> > > http://www.ale.org/mailman/listinfo/ale
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
--
James P. Kinney III \Changing the mobile computing world/
CEO & Director of Engineering \ one Linux user /
Local Net Solutions,LLC \ at a time. /
770-493-8244 \.___________________________./
http://www.localnetsolutions.com
GPG ID: 829C6CA7 James P. Kinney III (M.S. Physics)
<jkinney at localnetsolutions.com>
Fingerprint = 3C9E 6366 54FC A3FE BA4D 0659 6190 ADC3 829C 6CA7
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
More information about the Ale
mailing list