[ale] fascinating data on temperature, including ATI / AMD Radeon gpu

Alex Carver agcarver+ale at acarver.net
Thu Apr 25 02:53:50 EDT 2013


On 4/24/2013 22:57, Ron Frazier (ALE) wrote:

> Hi Alex,
>
> Wow.  Thanks for posting that great technical data.  Sorry if misstated
> things.  I guess there's a lot going on under the chip's cover.
>
> The temperature I'm measuring is what's reported by the AMD overdrive
> utility as "temperature".  It's the same number speedfan reports as
> "core" temperature.  So, I'm assuming that's on die temperature and the
> same one that can only go to 71 deg C using my Phenom II x6 as an example.

No, the sensor is not on the die itself, just near it.  RTDs are too 
expensive in terms of chip real estate to put one on the die.  The 
ratings on specification sheets are external maximums not internal. 
However, if the external temperature is 71 C then the die is likely 
sitting at over 150 C.  That's quite hot and some interesting physics 
begins to occur inside the silicon when you reach those numbers.

> So, what you're telling me is that my cpu's, memory, etc., will just
> spontaneously fail, even if they're not zapped by power surges and such.

Yes, eventually all electronics that run hot and stay hot will fail. 
How long that takes is a function of the peak and sustained temperatures 
because the damage is cumulative.

> I'm sitting next to a vintage 2002 laptop with a Pentium 4 chip.  At the
> moment, it still works, but I don't use it that often or that hard.
> Anyway, I've definitely been known to keep computers for 10 years.

I've got plenty of old machines, too.  I just don't abuse them too much 
but it doesn't stop me from using them.

> If you have any idea, about how long would a Phenom II x6 be expected to
> last if it's always running below 40 deg C versus if it's always running
> at Tmax - 15 or 56 deg C?

Without destructive testing it's impossible to predict.  It will just 
happen.  There are too many factors involved that determine the final 
threshold of failure.  For example, a P-N junction could have been 
marginal on your chip so it has a shorter lifespan.  Perhaps one of the 
interconnects was annealed too long and has a big spike ready to punch 
through.  On the other hand you could have absolutely flawless junctions 
and interconnects and the chip will last for decades.  There's just no 
way to know without performing a microscopic examination of the die 
itself at all points.

The thing to remember is that these are very nearly commodities now. 
Just use it, do your best to give it a reasonable operating environment 
and then, when it finally falls over, get a new one.  If it were 
something like some of my lab equipment then I'm going to do everything 
I can to hang onto it.  But a processor and motherboard for average 
desktops just aren't that expensive anymore to worry about heavily.  I'd 
go the water cooled route or with large diameter ball bearing fans only 
to keep my home office quiet (35 fans and as many hard drives going in 
here).  As long as I'm not abusing the systems then I'll get a 
reasonable life out of them.

> I noticed you said they "considered" adding cosmic ray detection but it
> sounds like they haven't.

Yes, because they still have to design the detectors and the control 
circuitry, test the designs, implement and then see if they can get it 
into production.  The design and production cycle on CPUs is about seven 
years.  Chips that are being designed today won't show up until 7-10 
years later.


More information about the Ale mailing list