[ale] fascinating data on temperature, including ATI / AMD Radeon gpu
Ron Frazier (ALE)
atllinuxenthinfo at techstarship.com
Wed Apr 24 20:47:12 EDT 2013
Hi all,
I have some additional temperature data I wanted to share with you based
on my further interaction with my systems and research. This is long,
but I think the info is very handy.
Supposed I asked you "what temperature can I run my cpu (or gpu) at and
be safe?". And you gave me a number. You'd almost certainly be wrong.
Suppose you asked someone else the same question and they gave you a
number. They'd almost certainly be wrong. Why? The answer is because
EVERY cpu has a different thermal design spec and maximum operating
temperature. Thus, any number you get is wrong for the vast majority of
parts. (That doesn't mean that two parts won't coincidentally share the
same number.) The only way to KNOW what your cpu can take is to look up
ITS specs from a credible source. The best source is the manufacturer's
website.
Here are the maximum operating temperatures, in deg C, for the 4 cpu
parts I have on hand.
AMD Athlon II x2 250 - 74 deg C
AMD Athlon II x3 460 - 75 deg C
AMD Phenom II x4 965 - 62 deg C (That's in insanely low number.)
AMD Phenom II x6 1045T - 71 deg C
As you can see, the numbers are all over the map. If I assume I can go
to 75 deg, or even 90 as some will tell you, I will be frying 2 of my 4
chips. Some devices can take 90 deg, but not these. There is no single
number that works well for all parts, except maybe 50 deg C. The good
thing is that they will probably shut down or self throttle before
destructing, but you still don't want to go there. Doing that could
wreck your OS on your HDD.
Here's where you can look up data for AMD desktop parts.
http://products.amd.com/pages/desktopcpuresult.aspx?AspxAutoDetectCookieSupport=1
I don't have much experience with Intel chips, although my laptops have
them. However, these resources should help you determine the maximum
temperature for them.
http://www.intel.com/support/processors/sb/CS-033342.htm
http://www.intel.com/support/processors/sb/CS-032341.htm
http://ark.intel.com/
So, once you know your max temperature, how do you make sure it's not
exceeded and that your fans stay at least relatively quiet?
I tried to quickly find authoritative data on lifetime versus
temperature. I couldn't find much in a short time. This article has
some good data, but doesn't address lifetime too much.
http://www.overclock.net/t/476469/the-truth-about-temperatures-and-voltages
Having been unable to find authoritative data in the time I allocated to
write this, I'll give you my opinion. It is strictly that, my opinion.
Others are free to disagree or prove me wrong.
My opinion is that any solid state component in my system should be fine
if I stay at least 15 degrees below the maximum limits listed.
Mechanical devices (hdd's, optical drives, floppy drives) are a whole
other matter.
In my opinion, with proper ventilation, the PC should be able to run
almost indefinitely at full load at Tmax - 15. I don't believe I'm
shortening the life substantially. Again, I could be wrong.
Having said that, I don't max my systems out unless I have a reason,
like mining, or video rendering, that I want to accomplish that requires
all that horsepower.
As you may know, the cpu coolers that come with cpu's are not the
greatest, but they can (usually) get the job done. They typically have
a little 3" fan on top of a heat sink. The main problem, for me, is
that once the fan spins up to about 5000 rpm, it makes an annoying
whining noise. At this point, I don't want to buy an aftermarket cooler.
So, I wanted to make sure my system didn't overheat, but also wanted it
to be as quiet as possible.
There are certainly various utilities out there for fan and temperature
control, but I want to mention what you might have built into your bios.
The bios for my main boards on my desktops is AMI. It has several
features for temperature control. (For my laptops, I just let them do
what they want, but I monitor the temperature.) I have my power
settings set for active cooling, which increases fan speed before
throttling the processor. I also have the processor set to throttle
down to as little as 20% frequency when not active.
In my bios, there is a feature which I have to turn on called cpu smart
temp, or something like that. Once it's turned on, I can set a
temperature target for the cpu. The system rounds to 5 degree
increments. The number I put in is Tmax - 15. So, for the Phenom II
x4, this is 62 - 15 = 47. This rounds to 45. For the Phenom II x6,
this is 71 - 15 = 56. This rounds to 55. Note that it's customized on
each PC for that chip.
There is also a minimum cpu fan setting which I set to 50%.
Thus, if the cpu is idling, as it is at the moment, and it's temperature
is 40 deg, the cpu fan will be mozying along at 50% of maximum speed or
about 2800 rpm. At this speed, it's relatively quiet. If I start
taxing the system and the temperature approaches the limits I set, the
fan will wind up, ultimately running about 5600 rpm. This will keep the
system within the limits I've set, or close to them.
This is quite noisy at full cpu speed. However, I'd rather have noisy
and cool versus quiet and hot.
I have tested my system by stressing it with Prime95. This is a program
which uses the cpu up to calculate prime numbers. If you wish, you can
contribute to a world wide scientific effort to find the primes, but you
don't have to. You can just use it to test your system. It's available
for almost any OS, including linux. If you'd like information on how to
use this, you can contact me. One cool thing is that you can turn
individual cpu cores on and off. So, you can partially load the system
if you want.
http://www.mersenne.org/
For my Phenom II x6 system, which is using the stock air cooler, the cpu
temp reaches a max of 58 degrees under full load with a maximum
specified temperature of 71. This is a 13 deg delta. I have no qualms
about running this thing full blast continuously if I have a reason to.
I do not, however, run Prime95 all the time, so it's usually idling.
My Phenom II x4 system is another matter. It simultaneously has a 62
deg max temperature and a 125 W power dissipation. Bad combination. I
was never able to guarantee a 15 deg delta below the max with the stock
air cooler. I have a corsair h70 liquid cooling unit. It has a heat
sink and liquid pump that fits on the cpu. This leads to a radiator
with 2 120 mm fans. This cooler WILL keep the monster cool. Under full
load, this cpu gets to 46 deg with a maximum of 62 deg. This is a 16
deg delta. Again, I have no qualms about running it full blast.
There are bios settings for the case fans as well. It is my preference
to have them running full speed all the time, so I set that to 100%. I
don't want any chance on a thermal runaway of the active components.
With a liquid cooling unit, there is a decision you have to make about
which fan port to connect the liquid pump to and which one to connect
the radiator fan(s) to. I chose to connect the liquid pump to a case
fan port, which is running at 100% all the time. I don't want the
liquid pump spooling down. I don't even know if it can be spooled down.
I then connected the radiator fans to what was originally the cpu fan
port. This is the port associated with the smart cpu temp setting in
the bios. So, the radiator fans WILL spool down to 50% when the system
is relatively dormant and cool. When the cpu is taxed, they will spool
up to their max just as a normal cpu fan would.
This bios also has a 'cool and quiet' function in the cpu section and a
number of 'green power' functions which adjust the different power
phases to be more efficient. I turned all this on. I don't want the PC
shutting down or going into standby mode, but I'm fine with it doing
things behind the scenes transparently.
I know the following has been mentioned before, but a great little
device to monitor power consumption is the
Kill-A-Watt EZ
http://www.homedepot.com/p/P3-International-Kill-A-Watt-EZ-Meter-P4460/202196388
You want the EZ model, which is more advanced than their original
design. With this one, not only can you monitor instant power usage,
but you can program in your electric rates and it will tabulate
cumulative cost of usage over time.
I hope this info is useful and that it will help you keep your cool.
Sincerely,
Ron
On 4/21/2013 2:01 PM, Jim Kinney wrote:
> One of the fun parts of temp monitoring is when the sensors must be
> calibrated. Most chips "know" the scale factors but some are off a
> bit. So the driver makes the change. With Linux system, you can feed a
> bunch scale-factor params to the start up of lm_sensors. Tyan used to
> provide the lm_sensor data they had tested for best accuracy on their
> boards. Not sure if other makers do or not.
>
>
> On Sun, Apr 21, 2013 at 12:38 AM, Ron Frazier (ALE)
> <atllinuxenthinfo at techstarship.com
> <mailto:atllinuxenthinfo at techstarship.com>> wrote:
>
> Hi all,
>
> The topic of monitoring temperatures in a PC comes up here
> periodically. As I mentioned in other threads, I've been working
> with graphics cards on a Mint installation for cryptocurrency
> computations. As you may know from my previous posts, I've always
> wanted to keep an eye on the status of my systems. In the process
> of working with this project, I've discovered a number of
> interesting pieces of information that I thought I'd share.
>
> Take a look at this image:
>
> https://dl.dropboxusercontent.com/u/9879631/sensors-sample1.png
>
> This shows a part of my screen on my Mint system. Note my Gnome
> panel at the top with a temperature monitor on it. This is the
> hardware monitor widget that is available in Gnome. However, when
> I installed the ATI / AMD graphics drivers, the sensor system was
> no longer able to monitor the cpu. After a bit of googling, I was
> directed to lm-sensors. Many of you are already aware of that. I
> tried this command.
>
> --> sudo apt-get install lm-sensors
>
> I found that it was already installed.
>
> I then found and issued these two commands to reinitialize the system.
>
> --> sudo sensors-detect
>
> I accepted the defaults here then told it to save the changes.
>
> --> sudo service module-init-tools start
>
> I think that allowed the changes to take effect without a reboot.
>
> This allowed the sensor system to work again, and my panel widgets
> to read both the cpu temperature and the hard drive temperatures
> as shown in the image.
>
> You can use this command to read the sensors once in a terminal
> window.
>
> --> sensors
>
> This command will read the sensors every few seconds and display
> the results continuously.
>
> --> watch sensors
>
> I searched for a while to find a utility to read the gpu
> temperatures. I found nothing for a while. Then I discovered
> that it's built into the ATI / AMD driver. I don't know how to do
> this with nvidia cards.
>
> The following command will read the clock speed and load on the
> first gpu.
>
> --> aticonfig --adapter=0 --od-getclocks
>
> The following command will read and display the results continuously.
>
> --> watch aticonfig --adapter=0 --od-getclocks
>
> The following command will read the temperature of the first gpu.
>
> --> aticonfig --adapter=0 --odgt
>
> The following command will read and display the results continuously.
>
> --> watch aticonfig --adapter=0 --odgt
>
> Once I found this out, I modified my mining program to add a
> temperature status window for each gpu so I could keep an eye on
> the temperature. This script file shows how I did it.
>
> https://dl.dropboxusercontent.com/u/9879631/start-miners
>
> If you look at these images, I also discovered something very
> interesting. The first one is the same as the one mentioned
> above, including the temperature readings of the GPU's on my Mint
> machine. The second is an image of the temperature readings of
> the GPU's on my Windows machine.
>
> https://dl.dropboxusercontent.com/u/9879631/sensors-sample1.png
> https://dl.dropboxusercontent.com/u/9879631/sensors-sample2.png
>
> All the gpu's are being run at close to 100% load, and the cases
> of both computers are well ventilated with multiple fans.
>
> Look at the Miner 1 temperature window in image 1. This is an MSI
> 7850 gpu running in the Mint machine. It's running at 73 deg C.
>
> Now, look at the right hand window in image 2. This is an
> IDENTICAL MSI 7850 gpu running in the Windows machine. It's
> running at 62 deg C.
>
> Like I said, they're identical cards running in almost identical
> conditions. So why is one running 11 degrees hotter than the other.
>
> This was puzzling me for a while but I think I've figured it out.
>
> In the Linux machine, the MSI card is in the TOP one in the
> chassis. That means its intake fan is right next to the 2nd gpu,
> with only about 1/8" of space between. So, it's air flow is very
> restricted. That's the card that's running hotter.
>
> In the Windows machine, the MSI card is the SECOND one in the
> chassis. It has several inches of air gap to the next object.
> It's the one that is running cooler.
>
> Now look at each image and compare the readings for each card
> within the same computer.
>
> In image 1, the Mint machine, Miner 1, the top card, is at 73 deg
> C. Miner 2, the bottom card, is at 57 deg C.
>
> In image 2, the Windows machine, the left window is an Asus 7850
> card, and is the top card. It's at 75 deg C. The right window,
> the MSI card, is in the bottom slot. It's running at 62 deg C.
>
> So, in one case, the top card is running 16 degrees hotter. In
> the other case, the top card is running 13 degrees hotter.
>
> Based on this, I am convinced that any gpu or other card with it's
> own fan on the side will run substantially hotter than its
> baseline temperature if it's next to another card.
>
> I'm not quite sure what to do about it. I think 75 deg C is OK,
> but not great. For what it's worth, I think my AMD cpu's are
> rated at about 67 deg C. Apparently, the gpu's have more
> tolerance. You can see in image 2 that the fans on the gpu's in
> the Windows system are only running at about 40% of their max,
> assuming that GPU-Z is reading them right. So, maybe the card is
> not too unhappy. But, it may mean the card would be pushed over
> its thermal limits much faster if a case fan fails, or if the room
> ambient temperature rises too much.
>
> Anyway, I found this fascinating. I guess I'll just have to keep
> a close eye on any PCI-E cards with fans which are jammed up
> against other cards.
>
> PS I think I was monitoring the wrong temperature for CPU on my
> desktop machine for years. The MSI motherboards have a 2 digit
> led display on the board which monitors post codes and then
> temperature once the machine is running. I was monitoring the
> sensor that matched that reading. When I ran the AMD Overdrive
> utility, it came up with a different, lower, number for CPU
> temperature, so I started monitoring that instead. I don't know
> now exactly which temperature that the motherboard display is
> monitoring.
>
> PPS I took some of the text in this email from the Linux machine
> to the Windows machine to write the email. When I tried to open
> it up in notepad, I just got one long line of text with no breaks,
> since Windows has different line breaks. However, I found out
> that I could open it in Wordpad and it worked OK. Then, I could
> copy it into this email.
>
> Let me know what your experiences have been monitoring and
> controlling temperature.
>
> Hope this is helpful.
>
> Sincerely,
>
> Ron
>
>
>
>
> --
> James P. Kinney III
> ////
> ////Every time you stop a school, you will have to build a jail. What
> you gain at one end you lose at the other. It's like feeding a dog on
> his own tail. It won't fatten the dog.
> - Speech 11/23/1900 Mark Twain
> ////
> http://electjimkinney.org
> http://heretothereideas.blogspot.com/
> ////
>
--
(PS - If you email me and don't get a quick response, you might want to
call on the phone. I get about 300 emails per day from alternate energy
mailing lists and such. I don't always see new email messages very quickly.)
Ron Frazier
770-205-9422 (O) Leave a message.
linuxdude AT techstarship.com
Litecoin: LZzAJu9rZEWzALxDhAHnWLRvybVAVgwTh3
Bitcoin: 15s3aLVsxm8EuQvT8gUDw3RWqvuY9hPGUU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20130424/35e07379/attachment-0001.html>
More information about the Ale
mailing list