[ale] Lab Workstation Mystery

Jim Kinney jkinney at jimkinney.us
Mon Mar 28 16:55:05 EDT 2016


Are the on the same switch?

On March 28, 2016 3:00:58 PM EDT, Todor Fassl <fassl.tod at gmail.com> wrote:
>This particular problem, if it is a power problem, has to be caused by 
>something a person could lug into the labs.  I've been saying they are 
>in different buildings but, technically, they are on different wings of
>
>the same building. I tend to think of them as different buildings 
>because I usually go outdoors to get from one to another. The point is 
>that they are widely separated.
>
>I haven't tried to find a pattern in the time of day. I only paid
>enough 
>attention to the time of the crashes to be certain that there is no 
>obvious pattern. The crashes occur at different times of day and night.
>
>
>
>On 03/28/2016 01:14 PM, Pete Hardie wrote:
>> I once tracked a bug that was due to the building elevator motors
>stopping
>> and starting differently after-hours
>>
>>
>> On Mon, Mar 28, 2016 at 1:36 PM, Dustin Strickland <
>> dustin.h.strickland at gmail.com> wrote:
>>
>>> The compressors in air conditioning units or refrigerators can also
>have
>>> an effect when they kick on.
>>>
>>> On Mon, Mar 28, 2016 at 1:30 PM, Jim Kinney <jkinney at jimkinney.us>
>wrote:
>>>
>>>> Microwave!!!
>>>>
>>>> The EM field from those can cause screens to be wacky, wiggly while
>they
>>>> run . I moved my desk from the opposite side of the wall from the
>home
>>>> microwave and still had to get 10' away to stop interference.
>>>>
>>>> Bit flips happen.
>>>>
>>>> On March 28, 2016 1:20:45 PM EDT, Todor Fassl <fassl.tod at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> We've run every kind of hardware diagnostic we can think of.
>Besides,
>>>>> it's just these 14 machines in the 2 shared spaces. Identical
>machines
>>>>> in private offices don't seem to have any problem.H
>>>>>
>>>>> But, you're right. Ssome kind of power problem is the best theory
>I've
>>>>> seen for a while. The 2 rooms are in different buildings and they
>never
>>>>> had a problem before. But maybe somebody is plugging something in.
>Come
>>>>> to think of it, we had a similar problem years ago when a student
>put a
>>>>> microwave oven in his office. The computers on the other side of
>the
>>>>> wall kept going down. I don't know enough about electricity to
>explain
>>>>> that but the microwave oven and the computer were plugged into
>outlets
>>>>> on opposite sides of the same wall.
>>>>>
>>>>> What kind of gizmo would a grad student be bringing into a lab
>that
>>>>> would make linux workstations freeze up?
>>>>>
>>>>> Another reason this theory makes se
>>>>>   nse is
>>>>> that I haven't gotten a single
>>>>> complaint about the machines going down. You'd think if they were
>going
>>>>> down while people were using them, I'd get complaints. People are
>always
>>>>> logged in when they go down but that doesn't mean anything since
>they
>>>>> tend to walk away w/o logging out. I've looked for patterns in the
>list
>>>>> of users who were logged in whan a machine went down but didn't
>see any.
>>>>> I can't rule out that it's somebody doing something though.  There
>might
>>>>> be a pattern and I just didn't see it. But I am sure there isn't
>one guy
>>>>> who is always logged in whan a machine goes down.
>>>>>
>>>>> On 03/28/2016 11:05 AM, James Taylor wrote:
>>>>>
>>>>>>   The most common, if not the only, reason I've seen partitions
>get marked read-only is when I've had power glitches that that caused a
>very brief interruption in connectivity to
>>>>>>    the
>>>>>> drives.
>>>>>>   Normally that is not an issue with locally attached drives on
>workstations, but stranger things have happened.
>>>>>>   Are the workstations on UPS or is the power to the rooms
>conditioned properly.
>>>>>>   -jt
>>>>>>
>>>>>>
>>>>>>   James Taylor
>>>>>>   678-697-9420
>>>>>>   james.taylor at eastcobbgroup.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>   Todor Fassl <fassl.tod at gmail.com> 3/28/2016 11:54 AM >>>
>>>>>>>>>
>>>>>>>>   I have a mysterious problem with workstations in a shared use
>>>>>>   environment. There are 2 labs in different buildings, onewith 6
>>>>>>   workstations and one with 8. These workstations ar
>>>>>>   e used
>>>>>> by a group of
>>>>>>   about 30 grad student TAs. All are running ubuntu 15.10.
>Authentication
>>>>>>   is via ldap and home directories are mounted  via nfs.  Every
>day, 2 or
>>>>>>   3 of the machines go down. The earliest symptom I can find is
>that the
>>>>>>   root filesystem is remounted read-only.  Soon they stop
>responding to
>>>>>>   ssh and snmp and they are essentially locked up. They still
>respond to
>>>>>>   pings though.
>>>>>>
>>>>>>   I've caught the machines in the period where the root system is
>>>>>>   read-only but I can still ssh to them. I've found that I cannot
>nfs
>>>>>>   mount home directories on our file server.  I can mount nfs
>shares on
>>>>>>   other servers. And I can mount the same home directories if I
>go to
>>>>>>   another workstation. Restarting nfs on the file server has no
>effect.
>>>>>>
>>>>>>   When I try to mount a home directory on an effected machine,
>the mount
>>>>>>   just hangs.  I ran it with strace and it just showed it was
>waiting --
>>>>>>   for what, I'm not sure and I
>>>>>>    don't
>>>>>> have a screen cap available at the
>>>>>>   moment. I put a packet sniffer on the server and it showed it
>received a
>>>>>>   single packet from the client and that's it.
>>>>>>
>>>>>>   There is nothing in the logs on the client. In fact, they
>simply stop at
>>>>>>   some point in the process. At first I attributed this to the
>root
>>>>>>   filesystem being read-only but it continues after I move /var
>to a
>>>>>>   separate file system. At some point it just stops writing
>records to the
>>>>>>   syslog but I don't know if it's before or after the root
>filesystem is
>>>>>>   remounted read-only.
>>>>>>
>>>>>>   Many of the TAs also have identical workstations in their
>offices. None
>>>>>>   of those machines seem to have this problem.  The TAs do tend
>to walk
>>>>>>   away from the workstations w/o logging out. But I wrote a
>script to kill
>>>>>>   off their sessions and it didn't help. I had it send me an
>email
>>>>>>   whenever it killed somebody's session and it doesn't seem to be
>>>>>>   correlated with that. In o
>>>>>>   ther
>>>>>> words, sometimes machines go down even if
>>>>>>   everyone who has used it has remembered to log out.
>>>>>>
>>>>>>   I'm pretty desperate. Any ideas?
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>>   Ale mailing list
>>>>>>   Ale at ale.org
>>>>>>   http://mail.ale.org/mailman/listinfo/ale
>>>>>>   See JOBS, ANNOUNCE and SCHOOLS lists at
>>>>>>   http://mail.ale.org/mailman/listinfo
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>>   Ale mailing list
>>>>>>   Ale at ale.org
>>>>>>   http://mail.ale.org/mailman/listinfo/ale
>>>>>>   See JOBS, ANNOUNCE and SCHOOLS lists at
>>>>>>   http://mail.ale.org/mailman/listinfo
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Sent from my Android device with K-9 Mail. Please excuse my
>brevity.
>>>>
>>>> _______________________________________________
>>>> Ale mailing list
>>>> Ale at ale.org
>>>> http://mail.ale.org/mailman/listinfo/ale
>>>> See JOBS, ANNOUNCE and SCHOOLS lists at
>>>> http://mail.ale.org/mailman/listinfo
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Ale mailing list
>>> Ale at ale.org
>>> http://mail.ale.org/mailman/listinfo/ale
>>> See JOBS, ANNOUNCE and SCHOOLS lists at
>>> http://mail.ale.org/mailman/listinfo
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> http://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
>>
>
>-- 
>Todd
>_______________________________________________
>Ale mailing list
>Ale at ale.org
>http://mail.ale.org/mailman/listinfo/ale
>See JOBS, ANNOUNCE and SCHOOLS lists at
>http://mail.ale.org/mailman/listinfo

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20160328/9963b00e/attachment.html>


More information about the Ale mailing list