<p>Eigen or Intel MKL (if you&#39;re on Intel) with OpenMP should help.</p>

<p> <a href="http://eigen.tuxfamily.org/dox-devel/GettingStarted.html">http://eigen.tuxfamily.org/dox-devel/GettingStarted.html</a><br>

 <br>

<a href="http://software.intel.com/en-us/intel-mkl">http://software.intel.com/en-us/intel-mkl</a></p>

<p> <a href="http://openmp.org/wp/">http://openmp.org/wp/</a></p>

<div class="gmail_quote">On Mar 8, 2013 11:35 AM, &quot;Jeff Hubbs&quot; &lt;<a href="mailto:jhubbslist@att.net">jhubbslist@att.net</a>&gt; wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

My *practical* experience has a hole in it when it comes to developing software to efficiently use multiple cores in a machine.<br>

<br>

If I&#39;m writing code in the likes of C++, Python, or Fortran (acknowledging that I&#39;ve got a range of programming paradigms there) and let&#39;s say that I&#39;m subtracting two 2-D arrays of floating point numbers from one another element-wise, how is it that the operation gets blown across multiple CPU cores in an efficient way, if at all?  Bear in mind that if this is done in Fortran, it&#39;s done in a pair of nested do-loops so unless the compiler is really smart, that becomes a serial operation.<br>


______________________________<u></u>_________________<br>

Ale mailing list<br>

<a href="mailto:Ale@ale.org" target="_blank">Ale@ale.org</a><br>

<a href="http://mail.ale.org/mailman/listinfo/ale" target="_blank">http://mail.ale.org/mailman/<u></u>listinfo/ale</a><br>

See JOBS, ANNOUNCE and SCHOOLS lists at<br>

<a href="http://mail.ale.org/mailman/listinfo" target="_blank">http://mail.ale.org/mailman/<u></u>listinfo</a><br>

</blockquote></div>