<div dir="auto">HPC is enterprise-like computing on steroids for scale and with tighter deadlines (milliseconds) and much worse code 😆</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Dec 8, 2024, 10:58 AM Leam Hall via Ale <<a href="mailto:ale@ale.org">ale@ale.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">/me scribbles notes<br>
<br>
Jim, thanks! You were the first person I thought of. I sent it to the list because others may have the same question. I've been on the periphery of HPC off and on, and have worked bigger non-HPC sites. Even did a SLES on Z Proof of Concept years ago.<br>
<br>
Leam<br>
<br>
<br>
<br>
On 12/8/24 09:32, Jim Kinney via Ale wrote:<br>
> You just missed Super Computing 24 in Atlanta.<br>
> <br>
> A major component of hoc is the ability to coordinate computation across<br>
> multiple physical compute nodes. This is coordinated by message passing,<br>
> mpi, with mpich being a top contender.<br>
> <br>
> Once the compute nodes are running the same code all at once, the big<br>
> bottleneck is next - filesystem IO. Big filesystems, petabyte range, have<br>
> challenges with locking, and all have a hard time with small files. With<br>
> new drives defaulting to a 4k block size and many hundreds or thousands of<br>
> data files in the 1-2k size, the metadata operations turn into a choke<br>
> point. This is mostly a coding problem as devs don't always write for the<br>
> specific hardware nearly as well as is needed. The analogy I used once was<br>
> "unless the engine in that Ferrari was designed for regular gas, you will<br>
> not get Ferrari performance if you fill it with QT regular gas".<br>
> <br>
> Monitoring tools are essential. You need to know if a node is being slow as<br>
> mpi runs only as fast as the slowest node.<br>
> <br>
> Job scheduling - slurm. Don't waste time with anything else unless a<br>
> hyperspecific need for another scheduler is made mandatory. In that case,<br>
> push back like mad untill slurm is chosen anyway. All schedulers are broken<br>
> in some way. But slurm solved the brokenness of all the others to make it's<br>
> own little problems.<br>
> <br>
> On Sun, Dec 8, 2024, 8:38 AM Leam Hall via Ale <<a href="mailto:ale@ale.org" target="_blank" rel="noreferrer">ale@ale.org</a>> wrote:<br>
> <br>
>> I'd like to learn more about HPC and Linux. Anyone have resources to share?<br>
>><br>
>> Thanks!<br>
>><br>
>> Leam<br>
>> _______________________________________________<br>
>> Ale mailing list<br>
>> <a href="mailto:Ale@ale.org" target="_blank" rel="noreferrer">Ale@ale.org</a><br>
>> <a href="https://mail.ale.org/mailman/listinfo/ale" rel="noreferrer noreferrer" target="_blank">https://mail.ale.org/mailman/listinfo/ale</a><br>
>> See JOBS, ANNOUNCE and SCHOOLS lists at<br>
>> <a href="http://mail.ale.org/mailman/listinfo" rel="noreferrer noreferrer" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
>><br>
> <br>
> <br>
> _______________________________________________<br>
> Ale mailing list<br>
> <a href="mailto:Ale@ale.org" target="_blank" rel="noreferrer">Ale@ale.org</a><br>
> <a href="https://mail.ale.org/mailman/listinfo/ale" rel="noreferrer noreferrer" target="_blank">https://mail.ale.org/mailman/listinfo/ale</a><br>
> See JOBS, ANNOUNCE and SCHOOLS lists at<br>
> <a href="http://mail.ale.org/mailman/listinfo" rel="noreferrer noreferrer" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
<br>
-- <br>
Linux Software Engineer  (<a href="http://reuel.net/resume" rel="noreferrer noreferrer" target="_blank">reuel.net/resume</a>)<br>
Scribe: The Domici War  (<a href="http://domiciwar.net" rel="noreferrer noreferrer" target="_blank">domiciwar.net</a>)<br>
Coding Ne'er-do-well   (<a href="http://github.com/LeamHall" rel="noreferrer noreferrer" target="_blank">github.com/LeamHall</a>)<br>
<br>
Between "can" and "can't" is a gap of "I don't know", a place of discovery. For the passionate, much of "can't" falls into "yet". -- lh<br>
<br>
Practice allows options and foresight. -- lh<br>
_______________________________________________<br>
Ale mailing list<br>
<a href="mailto:Ale@ale.org" target="_blank" rel="noreferrer">Ale@ale.org</a><br>
<a href="https://mail.ale.org/mailman/listinfo/ale" rel="noreferrer noreferrer" target="_blank">https://mail.ale.org/mailman/listinfo/ale</a><br>
See JOBS, ANNOUNCE and SCHOOLS lists at<br>
<a href="http://mail.ale.org/mailman/listinfo" rel="noreferrer noreferrer" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
</blockquote></div>