By Sinisa Veseli
It is often the case that certain classes of computing jobs have high priority and require immediate access to cluster resources in order to complete on time. For such jobs one could reserve a special queue with assigned set of nodes, but this solution might waste resources when there are no high priority jobs running. Alternative approach to this problem in Grid Engine involves using subordinate queues, which allow preemption of job slots. The way this works is as follows:
- A higher priority queue is configured with one or more subordinates queues
- Jobs running in subordinated queues are suspended when the higher priority queue becomes busy, and they are resumed when the higher priority queue is not busy any longer
- For any subordinate queue one can configure number of job slots (“Max Slots” parameter in the qmon “Subordinates” tab for the higher priority queue configuration) tab that must be filled in the higher priority queue to trigger a suspension. If “max slots” is not specified then all job slots must be filled in the higher priority queue to trigger suspension of the subordinate queue.
In order to illustrate this, on my test machine I’ve setup three queues (“low”, “medium” and “high”) intended to run jobs with different priorities. The “high” queue has both “low” and “medium” queues as subordinates, while the “medium” queue has “low” as its subordinate:
user@sgetest> qconf -sq high | grep subordinate
subordinate_list low medium
user@sgetest> qconf -sq medium | grep subordinate
subordinate_list low
user@sgetest> qconf -sq low | grep subordinate
subordinate_list NONE
After submitting a low priority array job to the “low” queue, qstat returns the following information:
user@sgetest> qsub -t 1-10 -q low low_priority_job.sh
Your job-array 19.1-10:1 ("low_priority_job.sh") has been submitted
user@sgetest> qstat -f
queuename qtype used/tot. load_avg arch states
----------------------------------------------------------------------------
high@sgetest.univaud.com BIP 0/2 0.07 lx24-amd64
----------------------------------------------------------------------------
medium@sgetest.univaud.com BIP 0/2 0.07 lx24-amd64
----------------------------------------------------------------------------
low@sgetest.univaud.com BIP 2/2 0.07 lx24-amd64
19 0.55500 low_priori user r 11/24/2008 17:05:02 1 1
19 0.55500 low_priori user r 11/24/2008 17:05:02 1 2
############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
19 0.55500 low_priori user qw 11/24/2008 17:04:50 1 3-10:1
Note that all available job slots on my test machine are full. Submission of the medium priority array job to the “medium” queue results in suspension of the previously running low priority tasks (this is indicated by the letter “S” next to the task listing in the qstat output):
user@sgetest> qsub -t 1-10 -q medium medium_priority_job.sh
Your job-array 20.1-10:1 ("medium_priority_job.sh") has been submitted
user@sgetest> qstat -f
queuename qtype used/tot. load_avg arch states
----------------------------------------------------------------------------
high@sgetest.univaud.com BIP 0/2 0.06 lx24-amd64
----------------------------------------------------------------------------
medium@sgetest.univaud.com BIP 2/2 0.06 lx24-amd64
20 0.55500 medium_pri user r 11/24/2008 17:05:17 1 1
20 0.55500 medium_pri user r 11/24/2008 17:05:17 1 2
----------------------------------------------------------------------------
low@sgetest.univaud.com BIP 2/2 0.06 lx24-amd64 S
19 0.55500 low_priori user S 11/24/2008 17:05:02 1 1
19 0.55500 low_priori user S 11/24/2008 17:05:02 1 2
############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
19 0.55500 low_priori user qw 11/24/2008 17:04:50 1 3-10:1
20 0.55500 medium_pri user qw 11/24/2008 17:05:15 1 3-10:1
Finally, submission of a high priority array job to the “high” queue results in previously running medium priority tasks to be suspended:
user@sgetest> qsub -t 1-10 -q high high_priority_job.sh
Your job-array 21.1-10:1 ("high_priority_job.sh") has been submitted
user@sgetest> qstat -f
queuename qtype used/tot. load_avg arch states
----------------------------------------------------------------------------
high@sgetest.univaud.com BIP 2/2 0.06 lx24-amd64
21 0.55500 high_prior user r 11/24/2008 17:06:02 1 1
21 0.55500 high_prior user r 11/24/2008 17:06:02 1 2
----------------------------------------------------------------------------
medium@sgetest.univaud.com BIP 2/2 0.06 lx24-amd64 S
20 0.55500 medium_pri user S 11/24/2008 17:05:17 1 1
20 0.55500 medium_pri user S 11/24/2008 17:05:17 1 2
----------------------------------------------------------------------------
low@sgetest.univaud.com BIP 2/2 0.06 lx24-amd64 S
19 0.55500 low_priori user S 11/24/2008 17:05:02 1 1
19 0.55500 low_priori user S 11/24/2008 17:05:02 1 2
############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
19 0.55500 low_priori user qw 11/24/2008 17:04:50 1 3-10:1
20 0.55500 medium_pri user qw 11/24/2008 17:05:15 1 3-10:1
21 0.00000 high_prior user qw 11/24/2008 17:05:52 1 3-10:1
Medium priority tasks will be resumed after all high priority tasks are done, and low priority tasks will run after medium priority job is finished.
One thing worthy of pointing out is that Grid Engine queue subordination is implemented on the “instance queue” level. In other words, if I had machine "A" associated with my queue “low”, but not with queues “high” or “medium”, jobs running on machine "A" would not be suspended even if there were higher priority jobs waiting to be scheduled.