Tuesday, December 2, 2008

Managing Resource Quotas in Grid Engine

By Sinisa Veseli

It is often the case that cluster administrators must impose limits on using certain resources. Good example here would be preventing a particular user (or a set of users), from utilizing entire queue (or cluster) at any point. If you’ve ever tried doing something like that for Grid Engine (SGE), then you know that it is not immediately obvious how to impose limits on resource usage.

SGE has a concept of “resource quota sets” (RQS), which can be used to limit maximum resource consumption by any job. The relevant qconf command line switches for manipulating resource quota sets are “-srqs” and “-srqsl” (show), “-arqs” (add), “-mrqs” (modify) and “-drqs” (delete).

Each RQS must have the following parameters: name, description, enabled and limit. RQS name cannot have spaces, but its description can be an arbitrary string. The boolean “enabled” flag specifies whether the RQS is enabled or not, while the “limit” field denotes resource quota rule that consists of an optional name, filters for a specific job request and the resource quota limit. Note that one can have multiple “limit” fields associated with a given RQS. For example, the following RQS prevents user “ahogger” to occupy more than 1 job slot in general, and it also limits the same user from running jobs in the headnodes.q queue:

$ qconf -srqs ahogger_job_limit
{
name         ahogger_job_limit
description  "limit ahogger jobs"
enabled      TRUE
limit        users ahogger to slots=1
limit        users {ahogger} queues {headnodes.q} to slots=0
}


The exact format in which RQS have to be specified is, like everything else, well documented in SGE man pages (“man sge_resource_quota”).

No comments:

Post a Comment