<p>With the high core count, narrow the cpu focus to the 16 core opterons. </p>
<div class="gmail_quote">On Jul 27, 2012 4:11 PM, "John Heim" <<a href="mailto:john@johnheim.net">john@johnheim.net</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
From: "Jeff Layton" <<a href="mailto:laytonjb@att.net">laytonjb@att.net</a>><br>
To: "Atlanta Linux Enthusiasts" <<a href="mailto:ale@ale.org">ale@ale.org</a>><br>
><br>
> The follow-up question is whether the FFT's are done locally<br>
> or if they are using an MPI based FFT?<br>
><br>
> However, I think as a starting point, you'll want compute nodes<br>
> that have reasonably fast processors, lots of cache (as Jim<br>
> pointed out) but you also needs tons of memory BW per core.<br>
> FFT's love memory BW!!<br>
><br>
> If the FFT's themselves are parallelized, then you will definitely<br>
> need InfiniBand. FFT's each networks for breakfast (in fact there<br>
> was a proposal from John Gustafson at Intel to make a 3D MPI<br>
> FFT the new benchmark for HPC since it pushed systems so<br>
> hard).<br>
<br>
I sent the PI your questions. Here are his answers (somewhat abbreviated<br>
and w/o personal info).<br>
<br>
1. 2D FFT's? 3D FFT's?<br>
<br>
Both. Probably 3D more often then 2D. But I am working on code right<br>
now that would always be 2code (never 3D).<br>
<br>
2. Is the code parallelized via MPI or OpenMP or both?<br>
<br>
We have never bothered to explicitly parallelize our code. We have been<br>
using the built-in parallelization in calls to FFTW.<br>
<br>
3. Is the code written with CUDA?<br>
<br>
No.<br>
<br>
4. How many cores or processes are used per run?<br>
<br>
We need to have the capability to use at least 64 cores per run, maybe 128<br>
or 256 if possible within our budget.<br>
<br>
5. Which compilers do you use or like?<br>
<br>
I think ifort is everybody's favorite. I use the gnu g95 compiler<br>
sometimes, but I think it produces slower object modules than ifort.<br>
<br>
6. How large are the input/output files?<br>
<br>
I create 10 Gb of output data from a serial run on my desktop iMac<br>
(although it has typically been closer to 1 Gb per run since I have limited<br>
disk space here). So if I had a big parallel run on 64 or more cores, I can<br>
imagine I could be creating 100 Gb of output<br>
data pretty easily and maybe even 1 Tb or more.<br>
<br>
_______________________________________________<br>
Ale mailing list<br>
<a href="mailto:Ale@ale.org">Ale@ale.org</a><br>
<a href="http://mail.ale.org/mailman/listinfo/ale" target="_blank">http://mail.ale.org/mailman/listinfo/ale</a><br>
See JOBS, ANNOUNCE and SCHOOLS lists at<br>
<a href="http://mail.ale.org/mailman/listinfo" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
</blockquote></div>