[ale] After 15 years, Nohup is sttll broken???

Chris Fowler cfowler at outpostsentinel.com
Wed Jul 31 11:24:53 EDT 2019


> From: "Lightner, Jeffrey" <JLightner at dsservices.com>
> To: "Chris Fowler" <cfowler at outpostsentinel.com>, "Atlanta Linux Enthusiasts"
> <ale at ale.org>, neal at mnopltd.com
> Sent: Wednesday, July 31, 2019 11:01:40 AM
> Subject: RE: After 15 years, Nohup is sttll broken???

> I gather you're doing an ssh from one system to another. I wonder if your shell
> default on either the system you're coming from or the second system has TMOUT
> variable set and it is thinking the session is idle due to lack of output. If
> so, doing an unset TMOUT might solve it. Note I have seen issues where firewall
> devices will timeout sessions such as VPN but I can full them by running a
> while loop in another window to just display data to the screen every 5
> minutes.

Nope, this is typical of these firewalls. If no packets are send after N minutes the firewall deletes the connection from its tracking table. This behavior is consistent, reliable, and repeatable. Once I tweaked KAs all these issues went away. 

I keep the KAs tight and on my distributed systems I use libkeepalive as LD_PRELOAD to FORCE KA on all TCP sockets. Regardless of what the programmer of that program wanted. It may be extreme, but in the past I communicated with these systems via IP over modem or their ethernet and I had sessions that would hang forever causing all sorts of problems. Especially with modems. To test that I had rectified the symptoms over modems I would simply open a TCP socket in perl to a remote system from a box in my lab that had a modem. The modem would dial and then the socket would connect. I'd then do a read on the socket, making sure that the read would be blocking and then I'd pull the phone line from my simulator or the real AT&T. I waited to see how the read would respawn. I also had issues with APIs that did select() with no timeout forever blocking due to no KA on the TCP socket. libkeepalive solved all those issues. The problematic program that forced use of libkeepalive was using SOAP in perl. 

I've spent hours dealing with NAT issues and when op said "Sonic Firewall" the alarms went off in my head. SSH does not need libkeepalive. It only needs those options I sent earlier. libkeepalive is for programs that have no options or you can not change. 

[root at localhost]# ka-test 
SO_KEEPALIVE is OFF 
[root at localhost]# env LD_PRELOAD=libkeepalive.so ka-test 
SO_KEEPALIVE is ON 
TCP_KEEPCNT = 3 
TCP_KEEPIDLE = 60 
TCP_KEEPINTVL = 10 
[root at localhost]# 

" 

From: "Lightner, Jeffrey" <JLightner at dsservices.com> 
To: "Chris Fowler" <cfowler at outpostsentinel.com>, "Atlanta Linux Enthusiasts" <ale at ale.org>, neal at mnopltd.com 
Sent: Wednesday, July 31, 2019 11:01:40 AM 
Subject: RE: After 15 years, Nohup is sttll broken??? 

I gather you're doing an ssh from one system to another. I wonder if your shell default on either the system you're coming from or the second system has TMOUT variable set and it is thinking the session is idle due to lack of output. If so, doing an unset TMOUT might solve it. Note I have seen issues where firewall devices will timeout sessions such as VPN but I can full them by running a while loop in another window to just display data to the screen every 5 minutes. 

Nope, this is typical of these firewalls. If no packets are send after N minutes the firewall deletes the connection from its tracking table. This behavior is consistent, reliable, and repeatable. Once I tweaked KAs all these issues went away. 

I keep the KAs tight and on my distributed systems I use libkeepalive as LD_PRELOAD to FORCE KA on all TCP sockets. Regardless of what the programmer of that program wanted. It may be extreme, but in the past I communicated with these systems via IP over modem or their ethernet and I had sessions that would hang forever causing all sorts of problems. Especially with modems. To test that I had rectified the symptoms over modems I would simply open a TCP socket in perl to a remote system from a box in my lab that had a modem. The modem would dial and then the socket would connect. I'd then do a read on the socket, making sure that the read would be blocking and then I'd pull the phone line from my simulator or the real AT&T. I waited to see how the read would respawn. I also had issues with APIs that did select() with no timeout forever blocking due to no KA on the TCP socket. libkeepalive solved all those issues. The problematic program that forced use of libkeepalive was using SOAP in perl. 

I've spent hours dealing with NAT issues and when op said "Sonic Firewall" the alarms went off in my head. SSH does not need libkeepalive. It only needs those options I sent earlier. libkeepalive is for programs that have no options or you can not change. 

[root at localhost]# ka-test 
SO_KEEPALIVE is OFF 
[root at localhost]# env LD_PRELOAD=libkeepalive.so ka-test 
SO_KEEPALIVE is ON 
TCP_KEEPCNT = 3 
TCP_KEEPIDLE = 60 
TCP_KEEPINTVL = 10 
[root at localhost]# 

"With most NAT devices, the NAT session limit is bound by the available memory in the device. Each NAT translation consumes about 160 bytes in the device's memory. As a result, 10,000 translations (a lot more than would normally be handled by a small router) will consume about 1.6 MB of memory. Therefore, a typical routing platform has more than enough memory to support thousands of NAT translations but in practice the story (as always) is different. 

Typically on smaller Cisco routers, e.g 700, 800, 1600 series, that have an IOS with NAT capabilities, the number of NAT sessions they are able to track is around 2000 without much trouble but this also depends on the NAT mode being used. Pump that up to something like 3000 to 4000 sessions and you start having major problems as the NAT table gets too big for the router's CPU to manage. As you see, it's not only a memory issue :) This is when you start to see big delays in ping replies and eventually an exponential increase in packet loss. 

We've actually seen Cisco routers having some problems while handling NAT translations ( NAT Overload mode in particular ). This was also confirmed by Mike Sweeney - a good friend of Firewall.cx, so keep in mind that the earlier Cisco IOS (pre v12.4.5) seems sometimes to behave a bit weird with NAT. 

The larger router models and dedicated gateway/firewall appliances are able to track a lot more connections simultaneously (8000 to 25000), which makes them ideal for large corporations that need such capacity." 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.ale.org/pipermail/ale/attachments/20190731/67f50b66/attachment.html>


More information about the Ale mailing list