[ale] HPC 101?

Leam Hall leamhall at gmail.com
Sun Dec 8 10:58:05 EST 2024


/me scribbles notes

Jim, thanks! You were the first person I thought of. I sent it to the list because others may have the same question. I've been on the periphery of HPC off and on, and have worked bigger non-HPC sites. Even did a SLES on Z Proof of Concept years ago.

Leam



On 12/8/24 09:32, Jim Kinney via Ale wrote:
> You just missed Super Computing 24 in Atlanta.
> 
> A major component of hoc is the ability to coordinate computation across
> multiple physical compute nodes. This is coordinated by message passing,
> mpi, with mpich being a top contender.
> 
> Once the compute nodes are running the same code all at once, the big
> bottleneck is next - filesystem IO.  Big filesystems, petabyte range, have
> challenges with locking, and all have a hard time with small files. With
> new drives defaulting to a 4k block size and many hundreds or thousands of
> data files in the 1-2k size, the metadata operations turn into a choke
> point. This is mostly a coding problem as devs don't always write for the
> specific hardware nearly as well as is needed. The analogy I used once was
> "unless the engine in that Ferrari was designed for regular gas, you will
> not get Ferrari performance if you fill it with QT regular gas".
> 
> Monitoring tools are essential. You need to know if a node is being slow as
> mpi runs only as fast as the slowest node.
> 
> Job scheduling - slurm. Don't waste time with anything else unless a
> hyperspecific need for another scheduler is made mandatory. In that case,
> push back like mad untill slurm is chosen anyway. All schedulers are broken
> in some way. But slurm solved the brokenness of all the others to make it's
> own little problems.
> 
> On Sun, Dec 8, 2024, 8:38 AM Leam Hall via Ale <ale at ale.org> wrote:
> 
>> I'd like to learn more about HPC and Linux. Anyone have resources to share?
>>
>> Thanks!
>>
>> Leam
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> https://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
>>
> 
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> https://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo

-- 
Linux Software Engineer   (reuel.net/resume)
Scribe: The Domici War    (domiciwar.net)
Coding Ne'er-do-well      (github.com/LeamHall)

Between "can" and "can't" is a gap of "I don't know", a place of discovery. For the passionate, much of "can't" falls into "yet". -- lh

Practice allows options and foresight. -- lh


More information about the Ale mailing list