[ale] uptime?

Robert Hoffman rob at frankenlinux.com
Fri Feb 23 06:37:26 EST 2001


I've been reading all of the uptime mails and have finally put my finger on a couple of things that have been bothering me.

There are three reasons a services go down (barring power failure):
1. Hardware dies
2. The OS craps out
3. The App craps out

Achieving our goal, uninterrupted service for the users, requires that we make good choices on all 3 areas. Frankly, it infuriates me when a service goes down. I have too many things to do to have to spend time trying to figure out why a service died. One of the reasons I use Linux is because it removes failure reason #2, an unreliable OS.

When an App dies on Windows, you rarely know if it was a problem with the operating system or the application. With Linux, is it ever really a question? I believe that this answer is what we get from looking at uptimes.

As for solving the problems with hardware and apps:

Part of this discussion has revolved around the load and number of services installed on the server. It is well known that loading up a bunch of software on a Windows box is asking for trouble. I know this to be true from personal experience. Windows can be made reasonably stable if you treat it with kid gloves. I have a Win NT PDC that's been running for over 6 months; that's all it does, I haven't even installed antivirus software on it. We just had to reboot an Exchange server that acually managed to go about 5 months before locking up. 5 months isn't bad but it ain't a year (my qmail relay has never crapped out.) What's really annoying is that these kind of uptimes on Windows are the exception, not the rule.

I also know from personal experience that I can load up a Linux server with all kinds of software and it won't even flinch.

I read Jeff's post about building one-task servers and something's been bothering me about it. What I don't like is that the one-task one-server
mentality is a carryover from supporting Windows for too long. I'm guilty of the same damn thing myself. I guess I finally realized it when someone wrote to me pointing out that my uptimes showed practically no load on any of my servers.

What this means is that I had my company waste money buying extra machines to run a bunch of services that I should have installed on our existing Samba server. Hell, the Samba server even has a redundant power supply and a scsi raid5 array. Lesson to me: Buy fewer severs with better/redundant hardware and run more services on them; If I don't do this, then I'm not really taking full advantage of Linux am I? This also helps eliminate service failure reason #1, hardware failure.

As for Applications failing, we just need to pick ones that work reliably. I've had Borland's AppServer die twice on me in the last two months for no good reason and on perfectly good Sun hardware. Clearly, apps like this must be destroyed and replaced with something better.

I have one final bug up my rear I'd like to get out. Uptimes are merely one indication of an OS's reliability and should be taken for nothing more. However, focusing solely on noticeable service interruptions to the end users frequently degrades into the practice of rebooting Windows weekly to improve reliability. I hate the concept of doing "Scheduled Maintenance" on servers. I don't want to drive into the office to reboot servers at 10PM on Sunday night so they don't lock up during the workweek. My time is more valuable than that.

</rant mode off>

-Rob Hoffman
--
To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.





More information about the Ale mailing list