[ale] OT: Multi-core Utilization
Alex Carver
agcarver+ale at acarver.net
Fri Mar 8 11:48:45 EST 2013
On 3/8/2013 08:33, Jeff Hubbs wrote:
> My *practical* experience has a hole in it when it comes to developing
> software to efficiently use multiple cores in a machine.
>
> If I'm writing code in the likes of C++, Python, or Fortran
> (acknowledging that I've got a range of programming paradigms there) and
> let's say that I'm subtracting two 2-D arrays of floating point numbers
> from one another element-wise, how is it that the operation gets blown
> across multiple CPU cores in an efficient way, if at all? Bear in mind
> that if this is done in Fortran, it's done in a pair of nested do-loops
> so unless the compiler is really smart, that becomes a serial operation.
Depending on who (the writer or the compiler) is optimizing the code,
who knows? :)
The sensible way to do it would be to exploit the known fact that matrix
addition and subtraction requires the matrices to be identical in
dimension. So you just send a row or a column of each matrix to each
core and let them rip through one short loop iterating over the elements
of a row or column. Now you have parallel single for{} loops chewing on
only one row/column per loop. The outer iteration is only needed to
divvy up the work across the cores and executes enough times to send all
the rows/columns to every available core. (The number of iterations of
that outer loop should be the order of the matrix divided by the number
of cores available, ceiling nearest integer).
...or you cheat and send it to the GPU which knows how to natively work
with matrices and blows the CPU out of the water. :)
More information about the Ale
mailing list