<html><head></head><body>I've got job priority already. It's effectively nice levels at job start time. That won't get the scheduler to launch onto the busy node. One there, yes, I can nice -19 Mary and Bob basically does nothing. But not exactly 0. <br><br>I'm beginning to see the scheduler as the stuck point. It's needs to overload a node (or 20) since the other jobs will get nice +20 and Mary gets nice -19.<br><br><div class="gmail_quote">On February 8, 2021 3:47:01 PM EST, Chuck Payne <terrorpup@gmail.com> wrote:<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div dir="auto">Is this where nice would come into play? Or using CPULimit on a job?</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Feb 8, 2021, 3:36 PM Jim Kinney via Ale <<a href="mailto:ale@ale.org">ale@ale.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>I've been looking at criu. My use case is HPC. <br><br>On the performance issues, since Bob is not running as it gets paged to swap only a bit of Mary will slow for the page out time. Bob can suffer since Mary owns the hardware.<br><br>The thing that criu does I can't see a way to work with is the pid change on restore. <br><br>In sge and variants, there's a shepherd process that manages the job process tree that's run on the hpc nodes. Criu would have to pause the shepherd process for each job which breaks the node daemon or pause the job which breaks the shepherd. <br><br>Granted, I'm still in theory land with no practical testing yet. <br><br>If only this hpc process actually worked with cgroups as is claimed....<br><br><div class="gmail_quote">On February 8, 2021 3:09:29 PM EST, Solomon Peachy via Ale <<a href="mailto:ale@ale.org" target="_blank" rel="noreferrer">ale@ale.org</a>> wrote:<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<pre>On Mon, Feb 08, 2021 at 02:13:55PM -0500, Jim Kinney via Ale wrote:<br><blockquote class="gmail_quote" style="margin:0pt 0pt 1ex 0.8ex;border-left:1px solid #729fcf;padding-left:1ex">Will the kernel move Bob's process from ram to swap and back if it <br>sits in STOP for a while (hours to days)? Unknown how long after Mary <br>starts that it eats all the RAM.<br></blockquote><br>It won't automatically move Bob's process to swap in one fell swoop; <br>instead as Mary's process needs more RAM, Bob's will get incrementally <br>paged out as it's not actively being accessed.<br><br>And when Mary's is finished, once Bob's is allowed to resume, it will <br>get incremetnally paged back in as its components are needed. (There's <br>probably a tunable or other mechanism to "encourage" it to page back in <br>more quickly, beyond running swapoff and forcing everything back..)<br><br>Performance is going to suffer while the paging is happening.<br><br>Perhaps a better option is the explicit checkpoint/restore mechanism using<br>the criu tool.<br><br> - Solomon</pre></blockquote></div><br>-- <br>Computers amplify human error<br>Super computers are really cool</div>_______________________________________________<br>
Ale mailing list<br>
<a href="mailto:Ale@ale.org" target="_blank" rel="noreferrer">Ale@ale.org</a><br>
<a href="https://mail.ale.org/mailman/listinfo/ale" rel="noreferrer noreferrer" target="_blank">https://mail.ale.org/mailman/listinfo/ale</a><br>
See JOBS, ANNOUNCE and SCHOOLS lists at<br>
<a href="http://mail.ale.org/mailman/listinfo" rel="noreferrer noreferrer" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
</blockquote></div>
</blockquote></div><br>-- <br>Computers amplify human error<br>Super computers are really cool</body></html>