[ale] PC power supply voltages

James P. Kinney III jkinney at localnetsolutions.com
Sat Feb 17 13:39:46 EST 2007


On Fri, 2007-02-16 at 09:29 -0500, Jerry Yu wrote:
> James, that's very informative. Do you have a link or article that
> elaborates on this voltage vs. system/data integrity relationship?

I don't have hard links that are not from places like solid-state
physics research journal. I have seen this in practice and know the
effect from the physics of the semiconductor devices involved. 

Over-voltage - low grade: similar, but not as severe as static shock.
Much semiconductor data processing happens using gate tunneling. The
tunnel acts as the flow control. Higher voltage means higher energy and
more tunneled flow. Some systems are designed to have "over-flow
buckets" that can capture the current above an expected amount. But that
is space sensitive and in things like CPU's not really an option. Just
as higher CPU voltage can make for overclocking by over-energizing the
clock (and making it tick faster) this causes a heat buildup. Heat
causes atomic level movement within the lattice structure. In
particular, the "impurity" in transistor junctions is susceptible to
thermal migration. Migration causes the characteristics of the junction
to change permanently and thus leads to device failure.

Over-voltage - high grade: not quite into the static shock range. A
device that normally uses 1.5 volts is slowly ramped up to 3 volts due
to a failing power supply control capacitor. The tunneling current rises
and causes a localized heating in the ramp area. This can cause a rapid
migration leaving a physical change in the barrier material that is
annealed into place. Usually, this will make a ramp lose is potential
height and thus allow tunnel current with a much lower potential than
designed. This leads to higher current, thus more heating and the
process degrades rapidly. I don't recall whether it is pnp or npn
junctions that are most affected by this. This process is also
responsible for conductor creep effects in the metallic contact regions
of CPUs and the like.

Under-voltage: Data corruption happens mostly in memory storage. A "0"
is a tiny voltage between two defined values and a "1" is typically a
higher voltage range. Under-voltages can spread the ranges and are
normally accounted for in good chipset design (tight control tolerances
that require retransmission - good data integrity but bad performance
vs. looser tolerances for higher performance but a higher risk of
bit-flip on storage). The "grey zone" is when something is flaky and the
"1" region begins to overlap the "0" range. The most common effect of
this is sudden lockup. If the CPU is designed to work with 2.0 volts and
starts only getting 1.5 volts, there is simply not enough energy to make
a definitive voltage split between high and low bits. When this occurs
during a memory write out, bit flips happen. If it happens on L1 cache,
the CPU horks up and the box will crash. L2 is often a bit more
forgiving as it can re-access the source (at the expense of time)
whereas L1 has no "memory" of the source. RAM level flips can be
anything from application crash, hard lock, slow performance with
non-ECC RAM. ECC RAM helps but is not able to solve more than a single
bit flip per write before a full resend occurs. 
> 
> On 2/16/07, James P. Kinney III <jkinney at localnetsolutions.com> wrote:
>         On Thu, 2007-02-15 at 22:34 -0500, Jim Popovitch wrote:
>         > On Thu, 2007-02-15 at 22:12 -0500, Dow_Hurst wrote:
>         > > Jim,
>         > > Can you hear any fan bearings going bad?  Has someone
>         brought in a small heater that is on the same circuit? 
>         >
>         > It's in a datacenter downtown, so I can't (yet) hear
>         it  ...and I don't
>         > feel like driving down there atm.  However an identical box,
>         right next
>         > to this box, isn't having the same issues... so I think the
>         "small 
>         > heater" issue is out of the question.  The fan speed is
>         different
>         > between the two boxes, but neither is out of range of it's
>         average over
>         > the past year.  All metrics are in spec, other than the
>         occasional blip 
>         > on the low voltage power lines.
>         
>         I would get a replacement PS in the box ASAP. Start double
>         checking the
>         logs for intermittent memory errors and non-fatal kernel
>         faults.
>         Over-voltages cause system damage. Under-voltages cause data
>         damage. 
>         >
>         > -Jim P.
>         >
>         > > Dow
>         > >
>         > >
>         > > -----Original Message-----
>         > > >From: Jim Popovitch <jimpop at yahoo.com>
>         > > >Sent: Feb 15, 2007 9:35 PM 
>         > > >To: Atlanta Linux Enthusiasts <ale at ale.org>
>         > > >Subject: Re: [ale] PC power supply voltages
>         > > >
>         > > >On Thu, 2007-02-15 at 20:44 -0500, Scott Castaline
>         wrote: 
>         > > >> Jim Popovitch wrote:
>         > > >> > I've got a box where lm-sensors shows the +3.3
>         voltage going up and down
>         > > >> > between 3.39 and 3.41.  It jumps to 3.41 every ~1.5
>         hours and shortly 
>         > > >> > thereafter drops back to 3.39 (which has been it's
>         normal voltage for
>         > > >> > the past year or so).  A second box right near it
>         isn't experiencing
>         > > >> > this behavior, which started at 2pm today, and there
>         are no other 
>         > > >> > indications of any problems.  How concerned should I
>         be?
>         > > >> >
>         > > >> I don't think that it's a problem yet since the
>         fluctuation is within 5%
>         > > >> of it's nominal voltage of 3.3. But it could be an
>         indication of
>         > > >> something is strarting to go, but I've seen older power
>         supplies do that
>         > > >> and last for several months to a couple of years.
>         > > > 
>         > > >It gets worse. :-(
>         > > >
>         > > >-12V is -14.91 (outside range [-10.80:-13.18])
>         > > >VBat is 0.03 (outside range [2.40:3.60]) <- WTF is VBat?
>         (it's a server)
>         > > >VCore is 1.35 (outside range [1.93:1.93])
>         > > >V5SB is 5.16 (outside range [4.86:5.14])
>         > > >
>         > > >
>         > > >-Jim P.
>         > >
>         > >
>         > > No sig.
>         > > _______________________________________________ 
>         > > Ale mailing list
>         > > Ale at ale.org
>         > > http://www.ale.org/mailman/listinfo/ale
>         > _______________________________________________ 
>         > Ale mailing list
>         > Ale at ale.org
>         > http://www.ale.org/mailman/listinfo/ale
>         --
>         James P. Kinney III
>         CEO & Director of Engineering 
>         Local Net Solutions,LLC
>         770-493-8244
>         http://www.localnetsolutions.com
>         
>         GPG ID: 829C6CA7 James P. Kinney III (M.S. Physics)
>         <jkinney at localnetsolutions.com>
>         Fingerprint = 3C9E 6366 54FC A3FE BA4D 0659 6190 ADC3 829C
>         6CA7
>         
>         _______________________________________________
>         Ale mailing list
>         Ale at ale.org
>         http://www.ale.org/mailman/listinfo/ale
>         
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
-- 
James P. Kinney III          
CEO & Director of Engineering 
Local Net Solutions,LLC        
770-493-8244                    
http://www.localnetsolutions.com

GPG ID: 829C6CA7 James P. Kinney III (M.S. Physics)
<jkinney at localnetsolutions.com>
Fingerprint = 3C9E 6366 54FC A3FE BA4D 0659 6190 ADC3 829C 6CA7
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part




More information about the Ale mailing list