[ale] Server room issues

Jeff Hubbs hbbs at comcast.net
Thu Apr 22 09:41:40 EDT 2004


One thing I researched extensively in one of my former jobs was the idea
of extending data center UPS runtimes by automatically powering off
systems and devices in the data center according to a redetermined
sequence in the event of a protracted power outage.

We had just gotten a very old UPS replaced and I had specced it out with
the maximum amount of batteries, so my idea was to hook a Linux machine
up to it via RS-232 and use Expect and other scripting to talk to
systems and power control modules from Black Box.

In parallel to this, I had planned to replace the VAX and AlphaServer
console terminals with a couple of Linux PCs with multiport serial I/O
cards running several instances of Minicom plus LCD monitors and a 2x8
KVM switch.  This way, two people could sit down and manipulate any
machine in the room without the use of Ethernet.  

The sequencing would have gone something like this:  first to go would
be the VAXen that didn't do very much anymore; the sequencer computer
connected to the UPS would log into them, shut them down cleanly, and
then switch them and their disk arrays off.  Also, a PC that was used
for Web development would go as would the Ethernet switch hooking up the
rest of the building, as the desktop machines would already be dead. 
Next would be the two VAXen that used to be the production platform.

VMS cluster state transitions are nasty when machines are heavily
loaded, so the sequencer would wait a while before making its next
move:  shut down the development node on the cluster.  At this point,
all that would be running would be the production cluster nodes, their
disk arrays, the Ethernet switches, the Frame Relay router and
associated t'comm, the KVM switch, the two sit-at PCs, and their
monitors.

At this point, the UPS would pretty much be coasting.  Some time on,
however, the users on one of the production cluster nodes would get an
automated message warning them of a shutdown; all of these users would
be in other locations and coming in via telnet over the Frame Relay
router.

One of the two production cluster nodes would then shut down along with
one of the two sit-at PCs and one of the two monitors.  The sequencer
would begin watching battery percentages at this point.  Three-quarters
of that percentage down, the sequencer would shut down and switch off
the remaining sit-at PC, its monitor, and the KVM switch.

Nine-tenths of that percentage down and the sequencer would initiate a
shutdown of the last remaining production cluster node.  Once that node
was shut down, the sequencer would send a command to the disk array
controller to spin down all of the drives and then it would cut power to
the controller.  Finally, the  last remaining cluster node would stay
powered on but not running until the batteries finally drained some five
hours after the power failed.

Ironically, I had done all of the R&D for this scheme in order to avoid
having to buy a $50,000 generator - then, the whole New Product
Development department was eliminated and then I was laid off before I
could actually do anything more than connect a Linux box to the new
UPS.  Then, the company went right out and bought the damn generator.  I
want them to paint "HUBBS MEMORIAL GENERATOR" on the side of it.



On Thu, 2004-04-22 at 02:37, Dow Hurst wrote:
> Here is a not so condensed listing of some advice from the list and others. 
> Read this as a rundown of a particular installation with some thoughts 
> injected.  I'd appreciate comments or advice if you want to add anything. 
> Names were blacked out to protect someone:
> 
> Starting Philosphy: servers should never shutdown, run 24/7; jobs should be
> checkpointable for restart in case of failure; people more important than 
> servers; we are not a 99.99999 facility
> 
> 1.  A/C issues
> 
>    3 separate systems, each with 0.5 needed capacity to cool server room, so 1
> can fail and be repaired w/out downtime on servers.  No redundant piping of
> coolant so that is a point of failure.  Two window units (if windows
> available) are installed as extra cooling capacity to handle either extra heat
> load or main unit failure for only a short time.  The window units make use of
> the window space which would have been a leak point for the A/C anyway.
> 
> Think of A/C and Power supply as a fixed capacity that is set when the room is
> built.  No more A/C or Power will be added due to the expense.  So, plan for
> future heat load and power load for the next 5-10 years.
> 
> Put in a raised floor for A/C plenum to deliver air to the underside of
> servers.  Extremely valuable and in XXXXX's opinion the most important server
> room feature.  Perforated tiles allow control and tuning of the cool air flow
> in the room.  This is important since the airflow is never correct after the
> room is filled with equipment.  It always needs redirection after the servers
> are installed.  The floor needs to be of high quality design with steel
> capable of supporting a 2000lb rack of equipment with ease.  XXXXX indicated
> that the floor should not deflect more than 1/32" for 4000lb/square inch
> applied force.  A steel loading dock ramp should be put in, not wood since
> heavy equipment will move in up to the raised floor level.
> 
> 
> An ale.org member, Jonathon Glass, said this:
> "You should really look at the IBM e325 series (Opterons) for cooling.  I have
> 4 of them (demo units) in a cluster.  I felt the cases while running a demo,
> and they were cool, if not cold, to the touch.  IBM has spent a lot of money
> on making these machines rack-ready, and cool running, and it has paid off."
> 
> Also he said:
> "How big are the Opteron nodes?  Are they 1,2,4U?  How big are the power
> supplies?  What is the maximum draw you expect?  Convert that number to figure
> out how much heat dissipation you'll need to handle.
> 
> I have a 3-ton A/C unit in my 14|15 x 14|15 server room, and the 24-33 node
> cluster I just spec'd out from IBM (1U, Dual Opterons) was rated at a max heat
> dissipation (is this the right word?) of 18,000 BTU.  According to my A/C guy,
> the 3-ton unit can handle a max of 36,000 BTU, so I'm well inside my limits.
> Getting the 3-ton unit installed in the drop-down ceiling, including
> installing new chilled water lines, was around $25K."
> 
> Another ale.org member, Chris Ricker, had this to say:
> "Just to give you another price point to compare, we just spec'ed out getting
> an additional 30 tons A/C (360,000 BTUs), and it's coming in at ~$100,000.
> That's just for adding two more 15-ton units, as most of the other
> infrastructure needed for that's already there...."
> 
> 
> 2.  Power issues
> 
>    Power cabling is run under the raised floor to receptacles in the floor.
> All circuits, except a couple of 120V 20A outlets, are 240V 30A single phase
> with the possibility of three phase if needed.  Large twist lock receptacles
> were used so unplugging a power cord has to be a deliberate action.  Some IBM
> servers need two 60A 240V 3phase circuits since they were designed to replace
> older IBM servers that used that type circuitry.
> 
> Grounding of servers is thru the raised floor steel structure which would be
> grounded thru the building ground for safety.
> 
> No data cables should go inside the raised floor if possible, only power
> cabling.  A high ceiling that allows overhead data cables is ideal since
> working in the cold air under the floor to install or fix data cables is an
> unpleasant experience.  However, XXXXX says don't sacrifice the raised floor
> for overhead cable runs.
> 
> UPSes were only used on the file servers and disk arrays.  The main compute
> servers were supported by space saving power conditioners that provide a pure
> sine wave and suppress voltage changes.  Space is at a premium in XXXXXXX and
> the city power is excellent so this solution saved battery space on large kVA
> capacity UPSes and in the long term the cost of battery replacements.  We may
> not have that luxury with the southern storms and above ground power grid.
> Power strips were put in that have digital readouts showing current amperage
> used in the circuit.  These readouts allow you to know your amperage per
> circuit in realtime and tune the load per circuit since most server power
> supplies run at less current than their rating.  Plus the strips will turn on
> each outlet in a timed sequence during power up so the server loads are staged
> and don't hit the circuits all at once.  Dial-in modem control or terminal
> server control is available on the power strips if wanted.
> 
> Jonathon Glass had this to say:
> "Just for the cluster, I have a 6kVa BestPower UPS.  It'll run all 16 nodes
> for about 15 minutes."
> 
> Jeffrey Layton chimed in with this thought:
> "We run CFD codes (Computational Fluid Dynamics) to explore fluid flow over
> and in aircraft.  The runs can last up to about 48 hours.  Our codes
> checkpoint themselves, so if we lose the nodes (or a node since we're running
> MPI codes), we just back up to the last checkpoint.  Not a big deal.  However,
> if we didn't checkpoint, I would think about it a bit.  48 hours is long time.
> If the cluster dies at 47:59 I would be very upset.  However, if we're running
> on a cluster with 256 nodes with UPS and if getting rid of UPS means I can get
> 60 more nodes, then perhaps I could just run my job on my more nodes and get
> done faster (reducing the window of vulnerability if you will).
> 
> You also need to think about how long the UPS' will last.  If you need to run
> 48 hours and the UPS kicks in about 24 hours, will the UPS last 24 hours?  If
> not, you will lose the job anyway (with no check pointing) unless you get some
> really big UPS'.  So in this case, UPS won't help much.  However, it would
> help if you were only a few minutes away from completing a computation and
> just needed to finish (if it's a long run, the odds are this scenario won't
> happen often).  If you could just touch a file and have your code recognize
> this so it could quickly check point, then a UPS might be worth it (some of
> our codes do this).  We've got generators that kick in about 10 seconds after
> power failure.  And the best thing is that they get tested every month (I can
> tell you stories about installations that never tested their diesel).
> 
> However, like I mentioned below, the ultimate answer really depends.  If I can
> tolerate the lose of my apps running then I can take the money I would dump
> into UPS and diesel and buy more nodes.  If your codes can't or don't check
> point, then you might consider UPS and diesel.  If you have a global file
> system on the nodes (like many people are doing today) that you need up or you
> need to at least gracefully shutdown, then consider a UPS and/or diesel.
> 
> I guess my ultimate question is, "Is UPS and diesel necessary for all or part
> of the cluster?"  There is no one correct answer.  The answer depends upon the
> situation.  However, don't be boxed into a corner that says you have to have
> UPS and diesel."
> 
> 
> 3.  Fire Detection versus Suppression
> 
>    XXXXX felt suppression systems might endanger the life of someone in the
> server room when actuated, so a fire detection system was installed.  This
> detection system is wired into the A/C and power so can turn them off if fire
> is present.  The logic is that the A/C would be the primary point for a fire
> to start or a overheating circuit so cutting power to A/C and servers would be
> most likely to stop the problem.
> 
> Also, insurance for major equipment items are written in as a rider on the
> building or institution's insurance policy.
> 
> Fire Suppression systems have come a long way and I am getting info on them.
> A mixture of gases that suppress fire but allow people to breathe are
> available and considered the norm for server rooms in businesses.  A
> particulate based system that leaves no residue is also available.  I've
> started the process to get info but exact room dimensions are required to
> quote accurately.  Probably $15-25K is about right for a suppression system.
> 
> Building codes are strict in XXXXXXX, so whatever building codes require is
> what they had to live with.  The sprinkler heads can be switched to high
> temperature heads and the piping can be isolated from other areas to prevent
> disasters, but we may not be able to eliminate sprinklers from the area.
> XXXXX explained that an extension cable is illegal in their server rooms and
> most of XXXXXXX due to building codes.  In the South codes will be much more
> relaxed.
> 
> Jonathon Glass said:
> "I do have sprinkler fire protection, but that room is set to release its
> water supply independent of the other rooms. Also, supposedly, the fire
> sprinkler heads (whatever they're called) withstand considerably more heat
> than normal ones.  So, the reasoning goes, if it gets hot enough for those to
> go off, I have bigger problems than just water.  Thus, I have a fire safe
> nearby (in the same bldg...yeah, yeah, I know; off-site storage!) that holds
> my tapes, and will shortly hold a hardware inventory and admin password list
> on all my servers."
> 
> 
> 4.  Room Location Caveats
> 
>    Don't be far from the loading dock to ease the movement of equipment into
> the server room.  Elevators, stairs, tight turns, doorways, and possibilities
> of flooding and corrosive gases are not obstacles we want our room to be near.
> XXXXX has one room at the level of the xxxx river so flooding is a concern.
> He mentioned that a large drain in the center of the floor is always nice to
> have.  Large servers may not fit thru doorways or in elevators.  IBM ships a
> large server in two pieces at an extra charge of thousands of dollars because
> some installations can't fit the server thru a door or up a stair.  XXXXX and
> XXXXX ran into this on their P960 IBM server.  They got the installation for
> free but it was a pain to deal with.
> 
> A continuous hinge steel door is good for sealing in cool air and discouraging
> theft, but is hard to remove for equipment installation.  A key lock works
> when their is no power, battery or otherwise. ;-)  A keycard electronic lock
> is good for multiple employees entering the room since you can track entry
> time and card codes, but the lock should be able to be opened in case of loss
> of power.  (Batteries are used alot for this, I think.  I'll ask XXX how the
> keycard locks work at KSU since I ought to know that anyway.)
> 
> 
> 5.  Server Cabinets
> 
>    Most standard clusters come in standard racks except for blades.  Some
> special large servers like the IBM P960 come in special racks that are a
> required purchase.  The SGI Altix can be put in a standard rack.  So, APC
> makes nice standard racks that can be purchased.  We would then install the 
> Altix parts into the standard rack and go from there.  Skip the front doors on 
> the racks unless you share the server room and feel you must lock up the 
> servers.  Keep the back doors since cabling and sensitive connectors need 
> protection.  Standard racks can come in half rack, full rack, and extra tall 
> rack sizes.  The extra tall sizes won't fit in elevators and may need some 
> special help to get into the server room.  A 47U rack full of servers will 
> easily weigh 2000lb, so expect to place them carefully before filling them up! 
>   Sliding rails are preferred for compute nodes but not preferred for disk 
> arrays.  Disk arrays usually have removeable disks from the front for hot swap 
> replacement so sliding the array out doesn't help.  Definitely high on XXXXX's 
> list of must haves is the LCD and keyboard mounted on a sliding rail.  You use 
> this to access the servers via a serial connection so is immune to network 
> problems.  A terminal server for serial access or a KVM switch is recommended. 
>   They have a $20K Raritan KVM switch while others rave about a Cyclades 
> terminal server.  I'm familiar with both products thru articles and 
> advertisements, but have only used the cheap 4 port or 2 port KVM switches 
> myself.  A KVM or terminal server setup might handle up to 128 or 256 console 
> connections that are hardwired via serial cables or a special backplane 
> connection.  The IBM blades have a special backplane that has the network 
> wiring and console wiring embedded in it.
> 
> You need the steel loading ramp when your delivery guys take a running start
> to get a 1 ton server up onto the raised floor!
> 
> XXXXX has two 30A 240V circuits per cabinet installed in the floor.  I imagine
> these are in the floor just past the rear of the cabinet's footprint.
> 
> Eleven inches are needed for overhead cable raceways if overhead cable runs
> are put in place.
> 
> At XXXXX they have one server room for proprietary servers and one room for
> standard rack based servers.
> 
> 
> 5.  Networking
> 
> Not discussed yet.
> 
> 



More information about the Ale mailing list