<p dir="ltr">I thought lustre went closed source and last GPL version became gluster.<br>
Orangefs has been on radar for a while but not implemented yet. Looks promising</p>
<div class="gmail_quote">On Jan 3, 2014 12:44 PM, "Vernard Martin" <<a href="mailto:vernard@gmail.com">vernard@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
NFS is ubiquitos but has issues. Try out OrangeFS or Lustre :-)<br>
<br>
<br>
<div>On 1/3/2014 10:49 AM, Jim Kinney wrote:<br>
</div>
<blockquote type="cite">
<p dir="ltr">That makes more sense. It would _have_ to be a kernel
thread that choked. So NFS is a highly likely culprit. NFS share
of an iscsi or fiber channel connection that drops and times out
has been involved.</p>
<p dir="ltr">If NFS weren't so useful, I would never use it again.</p>
<div class="gmail_quote">On Jan 3, 2014 10:22 AM, "Derek Atkins"
<<a href="mailto:derek@ihtfp.com" target="_blank">derek@ihtfp.com</a>>
wrote:<br type="attribution">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Jim,<br>
<br>
On Fri, January 3, 2014 10:03 am, Jim Kinney wrote:<br>
> That's the "well behaved" process. I'm looking for a
solution at the<br>
> kernel<br>
> control level that can alter the list of ports the kernel
manages for the<br>
> aberrant process that hangs with an open port and dies
leaving it open. It<br>
> feels like a kernel bug to have an open port with no
process attached.<br>
> Closing a port with the owning process still running
would be a useful<br>
> tool<br>
> for testing that process' response to a system failure.<br>
<br>
That would be a major kernel bug. When a process dies (not
zombies, but<br>
actually gets reaped) all open ports get closed. I've never
seen a kernel<br>
fail to reap that properly (although there are socket options
to let it be<br>
reused quickly, like SO_REUSEADDR).<br>
<br>
More likely what happened here is that a *kernel thread* died
and did not<br>
get cleaned up properly. E.g. if NFS was running out of the
kernel and a<br>
mountpoint died in a mysterious way, it could be possible that
it didn't<br>
get reaped properly. Unlikely, but possible.<br>
<br>
But yes, it is probably a kernel bug you are seeing.<br>
<br>
-derek<br>
<br>
--<br>
Derek Atkins <a href="tel:617-623-3745" value="+16176233745" target="_blank">617-623-3745</a><br>
<a href="mailto:derek@ihtfp.com" target="_blank">derek@ihtfp.com</a>
<a href="http://www.ihtfp.com" target="_blank">www.ihtfp.com</a><br>
Computer and Internet Security Consultant<br>
<br>
_______________________________________________<br>
Ale mailing list<br>
<a href="mailto:Ale@ale.org" target="_blank">Ale@ale.org</a><br>
<a href="http://mail.ale.org/mailman/listinfo/ale" target="_blank">http://mail.ale.org/mailman/listinfo/ale</a><br>
See JOBS, ANNOUNCE and SCHOOLS lists at<br>
<a href="http://mail.ale.org/mailman/listinfo" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Ale mailing list
<a href="mailto:Ale@ale.org" target="_blank">Ale@ale.org</a>
<a href="http://mail.ale.org/mailman/listinfo/ale" target="_blank">http://mail.ale.org/mailman/listinfo/ale</a>
See JOBS, ANNOUNCE and SCHOOLS lists at
<a href="http://mail.ale.org/mailman/listinfo" target="_blank">http://mail.ale.org/mailman/listinfo</a>
</pre>
</blockquote>
<br>
</div>
<br>_______________________________________________<br>
Ale mailing list<br>
<a href="mailto:Ale@ale.org">Ale@ale.org</a><br>
<a href="http://mail.ale.org/mailman/listinfo/ale" target="_blank">http://mail.ale.org/mailman/listinfo/ale</a><br>
See JOBS, ANNOUNCE and SCHOOLS lists at<br>
<a href="http://mail.ale.org/mailman/listinfo" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
<br></blockquote></div>