Technical and Scientific Computing with Grid Engine

Wednesday, May 6, 2009

Parsing SGE Accounting File

By Rich Wellner

Anyone managing an HPC cluster has probably wondered at some point about the overall performance and usage of his/her cluster. How many jobs were completed last month, what was the average job duration time, how long were they pending in queue, how many CPU slots did jobs require…? These are all good questions with answers buried somewhere in your DRM’s accounting files.

If you are using the Grid Engine, and assuming you have the usual “default cell” installation, the relevant file is $SGE_ROOT/default/common/accounting. The corresponding command that extracts information from this file is “qacct”. When you type something like “man qacct”, you will notice that qacct produces a summary of information for wall-clock, cpu and system time, and for different categories of such as hostname, queue-name, owner-name, etc., so that there is a good chance that information you are looking for is readily available. If, however, you happen to look for something that qacct does not provide, the accounting file is formatted for easy parsing. Each line in the file corresponds to one computing task, and there are more than different 40 accounting fields (separated by the ‘:’ character) on each line. The meaning of different fields is documented in the man pages (“man accounting”), so that getting information you need with standard UNIX tools should not be difficult at all.

Monday, December 8, 2008

The Private Clouds

By Ignacio Martin Llorente

Last month I was invited to give a couple of talks about Cloud computing in the wonderful C3RS (Cisco Cloud Computing Research Symposium). The slides are available online, if you want to check. Although the audiences were quite heterogeneous, there is a recurrent question among the participants of these events: How can I set my private cloud?. Let me briefly summarize the motivation of the people asking this:

Lease compute capacity from the local infrastructure. These people acknowledge the benefits of virtualizing their own infrastructure as a whole. However, they are not interested, in selling this capacity over the internet, or at least is not a priority for them. This is, they do not want to become a EC2 competitor, so they do not need to expose to the world a cloud interface.
Capacity in the cloud. They do not want to be the new EC2 but they want to use EC2. The ability of moving some services, or part of the capacity of a service, to an external provider is very attractive to them.
Open Source. Current cloud solutions are proprietary and closed, they need an open source solution to play with. Also, they are using some virtualization technologies that would like to see integrated in the final solution.

I said to these people, take a look to OpenNebula. OpenNebula is a distributed virtual machine manager that allows you to virtualize your infrastructure. It also features an integral management of your virtual services, including networking and image management. Additionally, it is shipped with EC2 plug-ins that allow you to simultaneously deploy virtual machines in your local infrastructure and in Amazon EC2.

OpenNebula is modular-by-design to allow its integration with any other tool, like the Haziea lease manager, or Nimbus that gives you a EC2 compatible interface in case you need one. It is a healthy open source software being improved in several projects like RESERVOIR, and it has a growing community.

Go here if you want to set up your private cloud!

Ruben S. Montero

Reprinted from blog.dsa-research.org

Friday, December 5, 2008

Using Grid Engine Subordinate Queues

By Sinisa Veseli

It is often the case that certain classes of computing jobs have high priority and require immediate access to cluster resources in order to complete on time. For such jobs one could reserve a special queue with assigned set of nodes, but this solution might waste resources when there are no high priority jobs running. Alternative approach to this problem in Grid Engine involves using subordinate queues, which allow preemption of job slots. The way this works is as follows:

A higher priority queue is configured with one or more subordinates queues
Jobs running in subordinated queues are suspended when the higher priority queue becomes busy, and they are resumed when the higher priority queue is not busy any longer
For any subordinate queue one can configure number of job slots (“Max Slots” parameter in the qmon “Subordinates” tab for the higher priority queue configuration) tab that must be filled in the higher priority queue to trigger a suspension. If “max slots” is not specified then all job slots must be filled in the higher priority queue to trigger suspension of the subordinate queue.

In order to illustrate this, on my test machine I’ve setup three queues (“low”, “medium” and “high”) intended to run jobs with different priorities. The “high” queue has both “low” and “medium” queues as subordinates, while the “medium” queue has “low” as its subordinate:

user@sgetest> qconf -sq high | grep subordinate
subordinate_list      low medium
user@sgetest> qconf -sq medium | grep subordinate
subordinate_list      low
user@sgetest> qconf -sq low | grep subordinate
subordinate_list      NONE

After submitting a low priority array job to the “low” queue, qstat returns the following information:

user@sgetest> qsub -t 1-10 -q low low_priority_job.sh
Your job-array 19.1-10:1 ("low_priority_job.sh") has been submitted
user@sgetest> qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high@sgetest.univaud.com    BIP   0/2       0.07     lx24-amd64
----------------------------------------------------------------------------
medium@sgetest.univaud.com    BIP   0/2       0.07     lx24-amd64
----------------------------------------------------------------------------
low@sgetest.univaud.com    BIP   2/2       0.07     lx24-amd64
19 0.55500 low_priori user       r     11/24/2008 17:05:02     1 1
19 0.55500 low_priori user       r     11/24/2008 17:05:02     1 2

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
19 0.55500 low_priori user       qw    11/24/2008 17:04:50     1 3-10:1

Note that all available job slots on my test machine are full. Submission of the medium priority array job to the “medium” queue results in suspension of the previously running low priority tasks (this is indicated by the letter “S” next to the task listing in the qstat output):

user@sgetest> qsub -t 1-10 -q medium medium_priority_job.sh
Your job-array 20.1-10:1 ("medium_priority_job.sh") has been submitted
user@sgetest> qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high@sgetest.univaud.com    BIP   0/2       0.06     lx24-amd64
----------------------------------------------------------------------------
medium@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64
20 0.55500 medium_pri user       r     11/24/2008 17:05:17     1 1
20 0.55500 medium_pri user       r     11/24/2008 17:05:17     1 2
----------------------------------------------------------------------------
low@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64    S
19 0.55500 low_priori user       S     11/24/2008 17:05:02     1 1
19 0.55500 low_priori user       S     11/24/2008 17:05:02     1 2

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
19 0.55500 low_priori user       qw    11/24/2008 17:04:50     1 3-10:1
20 0.55500 medium_pri user       qw    11/24/2008 17:05:15     1 3-10:1

Finally, submission of a high priority array job to the “high” queue results in previously running medium priority tasks to be suspended:

user@sgetest> qsub -t 1-10 -q high high_priority_job.sh
Your job-array 21.1-10:1 ("high_priority_job.sh") has been submitted
user@sgetest> qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64
21 0.55500 high_prior user       r     11/24/2008 17:06:02     1 1
21 0.55500 high_prior user       r     11/24/2008 17:06:02     1 2
----------------------------------------------------------------------------
medium@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64    S
20 0.55500 medium_pri user       S     11/24/2008 17:05:17     1 1
20 0.55500 medium_pri user       S     11/24/2008 17:05:17     1 2
----------------------------------------------------------------------------
low@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64    S
19 0.55500 low_priori user       S     11/24/2008 17:05:02     1 1
19 0.55500 low_priori user       S     11/24/2008 17:05:02     1 2

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
19 0.55500 low_priori user       qw    11/24/2008 17:04:50     1 3-10:1
20 0.55500 medium_pri user       qw    11/24/2008 17:05:15     1 3-10:1
21 0.00000 high_prior user       qw    11/24/2008 17:05:52     1 3-10:1

Medium priority tasks will be resumed after all high priority tasks are done, and low priority tasks will run after medium priority job is finished.

One thing worthy of pointing out is that Grid Engine queue subordination is implemented on the “instance queue” level. In other words, if I had machine "A" associated with my queue “low”, but not with queues “high” or “medium”, jobs running on machine "A" would not be suspended even if there were higher priority jobs waiting to be scheduled.

Tuesday, December 2, 2008

Managing Resource Quotas in Grid Engine

By Sinisa Veseli

It is often the case that cluster administrators must impose limits on using certain resources. Good example here would be preventing a particular user (or a set of users), from utilizing entire queue (or cluster) at any point. If you’ve ever tried doing something like that for Grid Engine (SGE), then you know that it is not immediately obvious how to impose limits on resource usage.

SGE has a concept of “resource quota sets” (RQS), which can be used to limit maximum resource consumption by any job. The relevant qconf command line switches for manipulating resource quota sets are “-srqs” and “-srqsl” (show), “-arqs” (add), “-mrqs” (modify) and “-drqs” (delete).

Each RQS must have the following parameters: name, description, enabled and limit. RQS name cannot have spaces, but its description can be an arbitrary string. The boolean “enabled” flag specifies whether the RQS is enabled or not, while the “limit” field denotes resource quota rule that consists of an optional name, filters for a specific job request and the resource quota limit. Note that one can have multiple “limit” fields associated with a given RQS. For example, the following RQS prevents user “ahogger” to occupy more than 1 job slot in general, and it also limits the same user from running jobs in the headnodes.q queue:

$ qconf -srqs ahogger_job_limit
{
name         ahogger_job_limit
description  "limit ahogger jobs"
enabled      TRUE
limit        users ahogger to slots=1
limit        users {ahogger} queues {headnodes.q} to slots=0
}

The exact format in which RQS have to be specified is, like everything else, well documented in SGE man pages (“man sge_resource_quota”).

Monday, November 17, 2008

Automating Grid Engine Monitoring

By Sinisa Veseli

When visiting client sites I often notice various issues with the existing distributed resource management software installations. The problems usually vary from configuration issues to queues in an error state. While things like inadequate resources and queue structure usually require more analysis and better design, problems like queues in an error state are easily detectable. So, cluster administrators, who are often busy with many other duties, should try to automate monitoring tasks as much as they can. For example, if you are using Grid Engine, you can easily come up with scripts like the one below, which looks for several different kinds of problems in your SGE installation:

#!/bin/sh

. /usr/local/unicluster/unicluster-user-env.sh

explainProblem() {
qHost=$1   # queue where the problem is found
msg=`qstat -f -q $qHost -explain aAEc | tail -1 | sed 's?-??g' | sed '/^$/d'`
echo $msg
}

checkProblem() {
description=$1  # problem description
signature=$2    # problem signature
for q in `qconf -sql`; do
cmd="qstat -f -q $q | grep $q | awk '{if(NF>5 && index(\$NF, \"$signature\")>0) print \$1}'"
qHostList=`eval $cmd`
if [ "$qHostList" != "" ]; then
for qHost in $qHostList; do
msg=`explainProblem $qHost`
echo "$description on $qHost:"
echo "  $msg"
echo ""
done
fi
done
}

echo "Grid Engine Issue Summary"
echo "========================="
echo ""
checkProblem Error E
checkProblem SuspendThreshold A
checkProblem Alarm a
checkProblem ConfigProblem c

Note that the above script should work with Unicluster Express 3.2 installed in the default (/usr/local/unicluster) location. It can be easily modified to, for example, send email to administrators in case problems are found that need attention. Although simple, such scripts usually go long way towards ensuring that your Grid Engine installation operates smoothly.

Thursday, November 6, 2008

Who Cares What's inside a Cloud?

By Roderick Flores

When I consider my microwave, telephone, or television I see fairly sophisticated applications that I simply plug into service providers and get useful results. If I choose to switch between individual service providers I can do so easily (assuming certain levels of deregulation of utility monopolies of course). Most importantly, while I understand how these appliances work, I would never want to build one myself. Yet I am not required to do so because the providers use standardized interfaces that appliance manufactures can easily offer: I buy my appliances as I might any other tool. Consequently, I can switch out the manufacturer or models for each of the services I use without interacting with the provider. I use these tools in a way that makes my work and life more efficient.

Nobody listens in on my conversations, nor do they receive services at my expense, I can use these services how I wish, and because of competition, I can expect an outstanding quality of service. At the end of the month, I get a bill from my providers for the services I used. These monetary costs are far outweighed by the convenience these services offer.

It is this sort of operational simplicity that motivated the first call for computational power as a utility in 1965. Like the electrical grid, a consumer would simply plug in their favorite application and use the compute power offered by a provider. Beginning in the 1990s, this effort centered around the concept of Grid computing.

Just like the early-days of electricity services, there were many issues with providing Grid computing. The very first offerings were proprietary or narrowly focused. The parallels with the electric industry are easily recognized. Some might provide street lighting whereas others would provide power for home lighting and still others for transportation and yet another group industrial applications. Moreover, each provider used different interfaces to get the power. Thus switching between providers, not a rare occurrence in a volatile industry, was no small undertaking. This, clearly was very costly for the consumer.

It took an entrepreneur to come to the industry and unify electrical services for all applications while also creating a standardized product (see http://www.eei.org/industry_issues/industry_overview_and_statistics/history for a quick overview). Similarly several visionaries had to step in and define what a Grid computer needed to do in order to create a widely consumable product. While these goals were largely met and several offerings became very successful, Grid computing never really became the firmly rooted utility-like service that we hoped for. Rather, it seems to have become an offering for specialized high-performance computing users.

This market is not the realm of service that I started thinking about early in this post. Take television service: this level of service is neither for a single viewer nor a small-business who might want to repackage a set of programs to its customers (say a sports bar). Rather it is for large-scale industries whose service requirements are unimaginable by all but a few people. I cannot even draw a parallel to television service. In telecommunication it would be the realm of a CLEC.

Furthermore, unlike my microwave, I am expected to customize my application to work well on a grid. I cannot simply plug it in and get better service than I can from my own PC. It would be the equivalent of choosing to reheat my food on my stove or building my own microwave. You see, my microwave, television service, and phone services are not just basic offerings of food preparation, entertainment, and communication. Instead, these are sophisticated systems that make my work and life easier. Grid computing, while very useful, does not simplify program implementation.

So in steps cloud computing: an emerging technology that seems to have significant overlap with grid computing while also providing simplifying services (something as a service). I may still have to assemble a microwave from pre-built pieces but everything is ready for me to use. I only have to add my personal touches to assemble a meal. It really isn't relevant whether the microwave is central to the task or just one piece of many.

When I approach a task that I hope to solve using a program, how might I plug that in just as easily? Let's quickly consider how services are provided for television. When I plug my application(TV) in to the electricity provider as well as a broadcaster of some sort, it just works. I can change the channel to the streams that I like. I can buy packages that provide me the best set of streams. In addition, some providers will offer me on-demand programming as well as internet and telephone services. If anything breaks, I call a number and they deal with it. None of this requires anything of me. I pay my bill and I get services.

Okay, how would that work for a computation? Say I want to find the inverse for a matrix. I would send out my data to the channel that inverted matrices the way I like them. The provider will worry about attaining the advertised performance, reliability, scalability, security, sustainability, device/location independence, tenancy, and capital expenditure: those characteristics of the cloud that I could not care less about. Additionally, the cloud properties that Rich Wellner assembled don't interest me much either. Certainly they may be differentiators, but the actual implementation is somebody else's problem in the same way that continuous electrical service provision is not my chief concern when I turn on the TV. What I want and will get is an inverse to the matrix I submitted in the time frame I requested deposited where I requested it to be put. I may use the inverted matrix to simultaneously solve for earthquake locations and earth properties or for material stresses and strains in a two-dimensional plate. That is my recipe and my problem.

After all, I should get services "without knowledge of, expertise with, or control over the technology infrastructure that supports them," as the cloud computing wiki page claims. Essentially the aforementioned cloud characteristics are directed towards service providers rather than to the non-expert consumer that highlights the wiki definition. Isn't the differentiator between the Cloud and the Grid the concealment of the complex infrastructure underneath? If the non-expert consumer is expected to worry about algorithm scalability, distributing data, starting and stopping resources and all of that, they certainly will need to gain some expertise quickly. Further, once they have that skill, why wouldn't they just use a mature Grid offering rather than deal with the non-standardized and chaotic clouds? Are these provider-specific characteristics not just a total rebranding of Grid?

As such, I suggest that several consumer-based characteristics should replace the rather inconsequential provider-internal ones that currently exist.

A cloud is characterized by services that:

use a specified algorithm to solve a particular problem;
can be purchased for one-time, infrequent use, or regular use;
state their peak, expected, and minimum performances;
state the expected response time;
can be queried for changes to expected response time;
support asynchronous messaging. A consumer must be able to discover when things are finished;
use standard, open, general-purpose protocols and interfaces (clearly);
have specified entry-points;
can interact with other cloud service providers. In particular, a service should be able to send output to long-term cloud-storage providers;

Now that sounds more like Computation-as-a-Service.

Monday, November 3, 2008

Cloud Computing: Commodity or Value Sale?

By Rich Wellner

There is a controversy in the cloud community today about whether the market is going to be one based on value or price. Rephrased, will cloud computing be a commodity or an enablement technology.

A poster on one of the cloud computing lists asserted that electricity would be a key component of pricing. He was then jumped on by people saying that value would be the key.

It seems like folks are talking past one another.

His assertion is true if CC is a commodity.

Now that said, there are precious few commodities in IT. Maybe internet connectivity is one. Monitors might be another. Maybe there are a few more.

But very quickly you get past swappable components that do very nearly the same job and into the realm of 'stuff' that is not easily replaceable. Then the discussion turns to one of value.

Amazon recognized the commodity of books and won the war over people who were trying to sell value. They appear to be attempting to do the same with computer time, which makes the battle they will fight over the next few years with Microsoft (and the increasing number of smaller players) extra interesting.

There is also the problem of making sweeping statements like "the market will figure things out". There is no "the market". Even on Wall Street. The reason things happen is because different people and institutions have different investment goals. Those goals vary over time and create growing or shrinking windows of opportunity for other people and institutions.

I've made my bet on how "the market" for cloud computing will shake out in the short to medium term. Now I'm just hoping that there are enough of the people and institutions my bet is predicated on in existence.