[ale] Lab Workstation Mystery

Jim Kinney jim.kinney at gmail.com
Mon Mar 28 13:22:30 EDT 2016


http://unixbhaskar.blogspot.com/2014/02/how-to-fix-read-only-root-file-system.html?m=1

Not a solution but something to look at, fstab.
On Mar 28, 2016 11:57 AM, "Todor Fassl" <fassl.tod at gmail.com> wrote:

> I have a mysterious problem with workstations in a shared use environment.
> There are 2 labs in different buildings, onewith 6 workstations and one
> with 8. These workstations are used by a group of about 30 grad student
> TAs. All are running ubuntu 15.10. Authentication is via ldap and home
> directories are mounted  via nfs.  Every day, 2 or 3 of the machines go
> down. The earliest symptom I can find is that the root filesystem is
> remounted read-only.  Soon they stop responding to ssh and snmp and they
> are essentially locked up. They still respond to pings though.
>
> I've caught the machines in the period where the root system is read-only
> but I can still ssh to them. I've found that I cannot nfs mount home
> directories on our file server.  I can mount nfs shares on other servers.
> And I can mount the same home directories if I go to another workstation.
> Restarting nfs on the file server has no effect.
>
> When I try to mount a home directory on an effected machine, the mount
> just hangs.  I ran it with strace and it just showed it was waiting -- for
> what, I'm not sure and I don't have a screen cap available at the moment. I
> put a packet sniffer on the server and it showed it received a single
> packet from the client and that's it.
>
> There is nothing in the logs on the client. In fact, they simply stop at
> some point in the process. At first I attributed this to the root
> filesystem being read-only but it continues after I move /var to a separate
> file system. At some point it just stops writing records to the syslog but
> I don't know if it's before or after the root filesystem is remounted
> read-only.
>
> Many of the TAs also have identical workstations in their offices. None of
> those machines seem to have this problem.  The TAs do tend to walk away
> from the workstations w/o logging out. But I wrote a script to kill off
> their sessions and it didn't help. I had it send me an email whenever it
> killed somebody's session and it doesn't seem to be correlated with that.
> In other words, sometimes machines go down even if everyone who has used it
> has remembered to log out.
>
> I'm pretty desperate. Any ideas?
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20160328/34aa5db8/attachment.html>


More information about the Ale mailing list