<div dir="auto">You just missed Super Computing 24 in Atlanta.<div dir="auto"><br></div><div dir="auto">A major component of hoc is the ability to coordinate computation across multiple physical compute nodes. This is coordinated by message passing, mpi, with mpich being a top contender.</div><div dir="auto"><br></div><div dir="auto">Once the compute nodes are running the same code all at once, the big bottleneck is next - filesystem IO. Big filesystems, petabyte range, have challenges with locking, and all have a hard time with small files. With new drives defaulting to a 4k block size and many hundreds or thousands of data files in the 1-2k size, the metadata operations turn into a choke point. This is mostly a coding problem as devs don't always write for the specific hardware nearly as well as is needed. The analogy I used once was "unless the engine in that Ferrari was designed for regular gas, you will not get Ferrari performance if you fill it with QT regular gas".</div><div dir="auto"><br></div><div dir="auto">Monitoring tools are essential. You need to know if a node is being slow as mpi runs only as fast as the slowest node.</div><div dir="auto"><br></div><div dir="auto">Job scheduling - slurm. Don't waste time with anything else unless a hyperspecific need for another scheduler is made mandatory. In that case, push back like mad untill slurm is chosen anyway. All schedulers are broken in some way. But slurm solved the brokenness of all the others to make it's own little problems.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Dec 8, 2024, 8:38 AM Leam Hall via Ale <<a href="mailto:ale@ale.org">ale@ale.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto">I'd like to learn more about HPC and Linux. Anyone have resources to share?<div dir="auto"><br></div><div dir="auto">Thanks!</div><div dir="auto"><br></div><div dir="auto">Leam</div></div>
_______________________________________________<br>
Ale mailing list<br>
<a href="mailto:Ale@ale.org" target="_blank" rel="noreferrer">Ale@ale.org</a><br>
<a href="https://mail.ale.org/mailman/listinfo/ale" rel="noreferrer noreferrer" target="_blank">https://mail.ale.org/mailman/listinfo/ale</a><br>
See JOBS, ANNOUNCE and SCHOOLS lists at<br>
<a href="http://mail.ale.org/mailman/listinfo" rel="noreferrer noreferrer" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
</blockquote></div>