Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

At the beginning of a PBS job wrapper script (invoked by verdi job worker), check if the job has sufficient time allocation remaining to complete the max estimated duration of a job. This ensures that the running job has at least X hours left in the PBS job for the node running the job, and exiting the verdi job worker if the PBS job does not have sufficient time remaining to complete the estimated job duration.

Inside job worker, before the start of each job, delegate to check login on Pleiades if have enough time for at least the next job duration (2-hours). Can run qstat -f $PBS_JOBID and compare the output for resources_used.walltime and Resources_List.walltime to see how much time remains. If insufficient time, then the pbs wrapper script can sigterm gracefully the PID of verdi job worker to trigger verdi job worker to exit the process and therefore end the PBS node. have job worker gracefully exit.

Based on current metrics, topsapp-singularity generating one S1-GUNW takes a mean of 1.5 hours on Broadwell. So add some margin and use that as max. e.g. requiring at least 2-hours remaining for current PBS job. if not, then exit the PBS job.

...