Technical and Scientific Computing with Grid Engine: December 2008

Monday, December 8, 2008

The Private Clouds

By Ignacio Martin Llorente

Last month I was invited to give a couple of talks about Cloud computing in the wonderful C3RS (Cisco Cloud Computing Research Symposium). The slides are available online, if you want to check. Although the audiences were quite heterogeneous, there is a recurrent question among the participants of these events: How can I set my private cloud?. Let me briefly summarize the motivation of the people asking this:

Lease compute capacity from the local infrastructure. These people acknowledge the benefits of virtualizing their own infrastructure as a whole. However, they are not interested, in selling this capacity over the internet, or at least is not a priority for them. This is, they do not want to become a EC2 competitor, so they do not need to expose to the world a cloud interface.
Capacity in the cloud. They do not want to be the new EC2 but they want to use EC2. The ability of moving some services, or part of the capacity of a service, to an external provider is very attractive to them.
Open Source. Current cloud solutions are proprietary and closed, they need an open source solution to play with. Also, they are using some virtualization technologies that would like to see integrated in the final solution.

I said to these people, take a look to OpenNebula. OpenNebula is a distributed virtual machine manager that allows you to virtualize your infrastructure. It also features an integral management of your virtual services, including networking and image management. Additionally, it is shipped with EC2 plug-ins that allow you to simultaneously deploy virtual machines in your local infrastructure and in Amazon EC2.

OpenNebula is modular-by-design to allow its integration with any other tool, like the Haziea lease manager, or Nimbus that gives you a EC2 compatible interface in case you need one. It is a healthy open source software being improved in several projects like RESERVOIR, and it has a growing community.

Go here if you want to set up your private cloud!

Ruben S. Montero

Reprinted from blog.dsa-research.org

Friday, December 5, 2008

Using Grid Engine Subordinate Queues

By Sinisa Veseli

It is often the case that certain classes of computing jobs have high priority and require immediate access to cluster resources in order to complete on time. For such jobs one could reserve a special queue with assigned set of nodes, but this solution might waste resources when there are no high priority jobs running. Alternative approach to this problem in Grid Engine involves using subordinate queues, which allow preemption of job slots. The way this works is as follows:

A higher priority queue is configured with one or more subordinates queues
Jobs running in subordinated queues are suspended when the higher priority queue becomes busy, and they are resumed when the higher priority queue is not busy any longer
For any subordinate queue one can configure number of job slots (“Max Slots” parameter in the qmon “Subordinates” tab for the higher priority queue configuration) tab that must be filled in the higher priority queue to trigger a suspension. If “max slots” is not specified then all job slots must be filled in the higher priority queue to trigger suspension of the subordinate queue.

In order to illustrate this, on my test machine I’ve setup three queues (“low”, “medium” and “high”) intended to run jobs with different priorities. The “high” queue has both “low” and “medium” queues as subordinates, while the “medium” queue has “low” as its subordinate:

user@sgetest> qconf -sq high | grep subordinate
subordinate_list      low medium
user@sgetest> qconf -sq medium | grep subordinate
subordinate_list      low
user@sgetest> qconf -sq low | grep subordinate
subordinate_list      NONE

After submitting a low priority array job to the “low” queue, qstat returns the following information:

user@sgetest> qsub -t 1-10 -q low low_priority_job.sh
Your job-array 19.1-10:1 ("low_priority_job.sh") has been submitted
user@sgetest> qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high@sgetest.univaud.com    BIP   0/2       0.07     lx24-amd64
----------------------------------------------------------------------------
medium@sgetest.univaud.com    BIP   0/2       0.07     lx24-amd64
----------------------------------------------------------------------------
low@sgetest.univaud.com    BIP   2/2       0.07     lx24-amd64
19 0.55500 low_priori user       r     11/24/2008 17:05:02     1 1
19 0.55500 low_priori user       r     11/24/2008 17:05:02     1 2

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
19 0.55500 low_priori user       qw    11/24/2008 17:04:50     1 3-10:1

Note that all available job slots on my test machine are full. Submission of the medium priority array job to the “medium” queue results in suspension of the previously running low priority tasks (this is indicated by the letter “S” next to the task listing in the qstat output):

user@sgetest> qsub -t 1-10 -q medium medium_priority_job.sh
Your job-array 20.1-10:1 ("medium_priority_job.sh") has been submitted
user@sgetest> qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high@sgetest.univaud.com    BIP   0/2       0.06     lx24-amd64
----------------------------------------------------------------------------
medium@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64
20 0.55500 medium_pri user       r     11/24/2008 17:05:17     1 1
20 0.55500 medium_pri user       r     11/24/2008 17:05:17     1 2
----------------------------------------------------------------------------
low@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64    S
19 0.55500 low_priori user       S     11/24/2008 17:05:02     1 1
19 0.55500 low_priori user       S     11/24/2008 17:05:02     1 2

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
19 0.55500 low_priori user       qw    11/24/2008 17:04:50     1 3-10:1
20 0.55500 medium_pri user       qw    11/24/2008 17:05:15     1 3-10:1

Finally, submission of a high priority array job to the “high” queue results in previously running medium priority tasks to be suspended:

user@sgetest> qsub -t 1-10 -q high high_priority_job.sh
Your job-array 21.1-10:1 ("high_priority_job.sh") has been submitted
user@sgetest> qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64
21 0.55500 high_prior user       r     11/24/2008 17:06:02     1 1
21 0.55500 high_prior user       r     11/24/2008 17:06:02     1 2
----------------------------------------------------------------------------
medium@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64    S
20 0.55500 medium_pri user       S     11/24/2008 17:05:17     1 1
20 0.55500 medium_pri user       S     11/24/2008 17:05:17     1 2
----------------------------------------------------------------------------
low@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64    S
19 0.55500 low_priori user       S     11/24/2008 17:05:02     1 1
19 0.55500 low_priori user       S     11/24/2008 17:05:02     1 2

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
19 0.55500 low_priori user       qw    11/24/2008 17:04:50     1 3-10:1
20 0.55500 medium_pri user       qw    11/24/2008 17:05:15     1 3-10:1
21 0.00000 high_prior user       qw    11/24/2008 17:05:52     1 3-10:1

Medium priority tasks will be resumed after all high priority tasks are done, and low priority tasks will run after medium priority job is finished.

One thing worthy of pointing out is that Grid Engine queue subordination is implemented on the “instance queue” level. In other words, if I had machine "A" associated with my queue “low”, but not with queues “high” or “medium”, jobs running on machine "A" would not be suspended even if there were higher priority jobs waiting to be scheduled.

Tuesday, December 2, 2008

Managing Resource Quotas in Grid Engine

By Sinisa Veseli

It is often the case that cluster administrators must impose limits on using certain resources. Good example here would be preventing a particular user (or a set of users), from utilizing entire queue (or cluster) at any point. If you’ve ever tried doing something like that for Grid Engine (SGE), then you know that it is not immediately obvious how to impose limits on resource usage.

SGE has a concept of “resource quota sets” (RQS), which can be used to limit maximum resource consumption by any job. The relevant qconf command line switches for manipulating resource quota sets are “-srqs” and “-srqsl” (show), “-arqs” (add), “-mrqs” (modify) and “-drqs” (delete).

Each RQS must have the following parameters: name, description, enabled and limit. RQS name cannot have spaces, but its description can be an arbitrary string. The boolean “enabled” flag specifies whether the RQS is enabled or not, while the “limit” field denotes resource quota rule that consists of an optional name, filters for a specific job request and the resource quota limit. Note that one can have multiple “limit” fields associated with a given RQS. For example, the following RQS prevents user “ahogger” to occupy more than 1 job slot in general, and it also limits the same user from running jobs in the headnodes.q queue:

$ qconf -srqs ahogger_job_limit
{
name         ahogger_job_limit
description  "limit ahogger jobs"
enabled      TRUE
limit        users ahogger to slots=1
limit        users {ahogger} queues {headnodes.q} to slots=0
}

The exact format in which RQS have to be specified is, like everything else, well documented in SGE man pages (“man sge_resource_quota”).