<html><head></head><body><div>And that is a very useful capability. As this system is centos 5.11, I had to go through and do a manual service stop on everything until I was able to have nothing stuck open but the hung network and the nfs connection.</div><div><br></div><div>On Tue, 2015-10-20 at 17:19 +0000, Lightner, Jeff wrote:</div><blockquote type="cite"><pre>I assume you're joking but just in case:
Systemd has services that can start/stop without dependence of the entire stack of services unlike init. However, some of the services may be dependent on SOME other services running. The beauty of this is with a hung system you might actually shut down most services even if some things like NFS are hung so that when you power cycle you're not pulling the legs out from under as many things as you might if your init based shutdown hung on the first script it tried to stop.
P.S. vim rules!
-----Original Message-----
From: <a href="mailto:ale-bounces@ale.org">ale-bounces@ale.org</a> [<a href="mailto:ale-bounces@ale.org">mailto:ale-bounces@ale.org</a>] On Behalf Of DJ-Pfulio
Sent: Tuesday, October 20, 2015 1:09 PM
To: <a href="mailto:ale@ale.org">ale@ale.org</a>
Subject: Re: [ale] lsof and a hung system
But isn't systemd supposed to solve these issues?
BTW, I had to add a similar delay in the startup of a raspberry-pi box that got systemd with the 4.1 kernel in a debian install.
On 10/20/2015 12:25 PM, Jim Kinney wrote:
<blockquote type="cite">
Yep. The 10G card driver had oopsed all over itself and wouldn't keep
a connection up. I initially tried to stop network, unload the module,
load the module, start the network but even that failed to reset the
card completely. I needed to add a sleep 20 before loading the module
again. Once the connection was actually working the system was cleanly
rebooted to lop off the zombies and things were happily OK.
On Tue, 2015-10-20 at 11:32 -0400, Ed Cashin wrote:
<blockquote type="cite">
On Mon, Oct 19, 2015 at 10:58 PM, Jim Kinney <<a href="mailto:jim.kinney@gmail.com">jim.kinney@gmail.com</a>>
wrote:
...
<blockquote type="cite">
Other system with same nfs mounted storage is fine. Storage server
is connected to both number crunchers by dedicated, unswitched
10Gbps fiber ethernet.
<blockquote type="cite">
</blockquote>
</blockquote>
You mean with direct connections? In that case, the other number
cruncher's connection could be fine, while the affected system could
not be able to do networking to the NFS server (for some as yet
undetermined reason), which could result in the behavior you describe
if the NFS mount is "hard".
</blockquote>
</blockquote>
_______________________________________________
Ale mailing list
<a href="mailto:Ale@ale.org">Ale@ale.org</a>
<a href="http://mail.ale.org/mailman/listinfo/ale">http://mail.ale.org/mailman/listinfo/ale</a>
See JOBS, ANNOUNCE and SCHOOLS lists at
<a href="http://mail.ale.org/mailman/listinfo">http://mail.ale.org/mailman/listinfo</a>
_______________________________________________
Ale mailing list
<a href="mailto:Ale@ale.org">Ale@ale.org</a>
<a href="http://mail.ale.org/mailman/listinfo/ale">http://mail.ale.org/mailman/listinfo/ale</a>
See JOBS, ANNOUNCE and SCHOOLS lists at
<a href="http://mail.ale.org/mailman/listinfo">http://mail.ale.org/mailman/listinfo</a>
</pre></blockquote><div class="-x-evo-signature-wrapper"><span><pre>--
James P. Kinney III
Every time you stop a school, you will have to build a jail. What you
gain at one end you lose at the other. It's like feeding a dog on his
own tail. It won't fatten the dog.
- Speech 11/23/1900 Mark Twain
http://heretothereideas.blogspot.com/
</pre></span></div></body></html>