[ale] uptime?

Thu Feb 22 15:20:18 EST 2001

I've been rattling this around in what I use for a head, along with the
other responses. Lots of anecdotal evidence and opinions, plenty of good
flame bait, and nothing really useful seems to be developing. Well,
Brian's response is in a direction I suspect of being useful, but
elaboration on the theme might be useful.

By way of summary of what appears to be going on:

1) Single machine uptime (time between reboots) is of very limited value
as a figure of merit with respect to the stability of the software.

2) % of time a particular service is available on average is the user's
perception of stability.

3) Unplanned service interuptions are frequently expensive in terms of
administrator and user time if nothing else. Usually there will also be a
dollar cost, which should be estimateable in many cases.

4) Good administrators with good hardware & proper loading have better
uptimes than less good administrators with less good hardware and improper
loading.

Given that simple uptimes give no real usable information to beat
different OS's with (and granting the importance of OS stability), is
there a relatively good figure of merit which allows the rating of
stability?

Working from Brian's start, and taking the (apparently) common practice of
separate hardware for different services.

I assume a web site/complex, running the web server, a database, and
firewall et al. For the site as a whole to achieve 99% availablility, all
three machines must be providing their service (ignoring failover schemes
at the moment). Thus, each machine must be better than 99% available to
achieve the desired reliability. (I think it is about 99.5% minimum for
each machine).

Having the ability to fail over lowers the individual reliability
requirements, but durned if I know how to approximate them. I _think_ the
individual machines can be less reliable if services can be switched
between machines. I am of course assuming that addition of a service to a
loaded machine will only slow the service down, and _not_ change the
stability of the machine.

Adding concepts like cost of lost jobs (on average) and so forth - way
beyond my current skills.

Thanks to all of you for letting me babble out loud. 

On Wed, 21 Feb 2001, Brian J. Dowd wrote:

> How about a totally fabricated pseudo-scientific answer?
> 
> Take the number of computers in a network which have
> shareable data on them. Multiply the "uptimes" together
> to show to effects on availability of data.
> Example  #1 100 computers with uptimes of 98% is 100^.98 = ~91%
> Example #2 100 computers with uptimes of 95% is 100^.95 = ~79%
> In a truly distributed environment where you may *think* that
> an uptime of 95% (Win) is ok, the effect on the other systems
> can be, in fact, quite poor (79%). What is even worse? Add just
> one Windows machine into a network of Linux machines and see
> what the transitive value of one poor percentage does to the final number.
> -Brian J. Dowd
> 
> > Hey! This _has_ to be a great question! (Largely due to the "Calculate
> > how... portion - quantify things!)
> >
> > Regretably, much as I admire the question, I can only make qualitative
> > comments. Not quite the useful information requested.
> >
> > By training from years ago on DEC minis, any system which takes a three
> > finger salute as part of the maintenance, gets a one finger salute
> > at the same time. No cost/benifit analysis is known to me however.
> >
> > Likewise, when I sit to the keyboard, I expect things to be ready. I'm too
> > sloppy to worry about the last time the stupid thing was rebooted.
> >
> > I'm looking forward to reading _real_ answers now.
> >
> > On Wed, 21 Feb 2001, Robert L. Harris wrote:
> >
> > >
> > >
> > >   Was watching a discussion the other day.  Someone made a snide
> > > anti-linux comment about everyone pro-linux being so impressed by uptimes
> > > and how useless they were.
> > >
> > >
> > >   At anyrate, I started wondering what good is an uptime in reality?  I'd
> > > like to know a real good use for it.  You can caluculate how since the
> > > last crash, etc.  I have my own reasons, but what are other good uses for
> > > this information?
> > >
> > > Robert
> > >
> > >
> > > :wq!
> > > ---------------------------------------------------------------------------
> > > Robert L. Harris                |  Micros~1 :
> > > Senior System Engineer          |    For when quality, reliability
> > >   at RnD Consulting             |      and security just aren't
> > >                                 \_       that important!
> > > DISCLAIMER:
> > >       These are MY OPINIONS ALONE.  I speak for no-one else.
> > > FYI:
> > >  perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
> > >
> > > --
> > > To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.
> > >
> >
> > --
> > ===========================================
> > The harder I work, the luckier I get.
> >                     Lee Iacocca
> > ===========================================
> > Thompson Freeman          tfreeman at intel.digichem.net
> >
> > --
> > To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.
> 
> --
> To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.
> 

-- 
===========================================
The harder I work, the luckier I get.
                    Lee Iacocca
===========================================
Thompson Freeman          tfreeman at intel.digichem.net

--
To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.