Friday, April 15, 2011

Cuda Load Sensor for Grid Engine


Do you use Nvidia Cuda on Grid Engine? Here is a GPU Load Sensor which supports Cuda that enables Grid Engine to schedule according to your current Cuda capacity and capabilities.

It has been tested against SGE 6.2u5 and UGE 8.0.0, but should work with any 6.x or later Grid Engine. It has been built against Cuda 2.2, 2.3 and 3.2.

Still some work to do to tune how the load values are set, and what types they are. I would like to hear from anyone trying it out, and suggestions on how best to use it to launch jobs.

Grab it here
https://github.com/StephenDennis/gpu-load-sensor


Photo: 1971 (probably) Plymouth Barracuda pic from Dominics Pics. V8 Engine....coincidence?

Tuesday, April 12, 2011

Grid Engine 8.0.0 Announced

Drop in replacement for SGE 6.2u5 is available now. See Univa's announcement.. Thats 'Drop In' as in no stopping your grid....

Tuesday, April 5, 2011

Cycle computing creates 10k core condor cluster on a click

Click to start 10k core condor cluster that costs about a kilobuck and hour is simply an amazing accomplishment. The HPC in the cloud bar has been reset dramatically.

Cycle's article here.

Wednesday, March 30, 2011

New Grid Engine Logo

New Grid Engine Logo

Grid Engine got a new home recently and now has a new logo.

Grid Gurus moved to Blogger

Sort of obvious if you are reading this. I just moved Grid Gurus to blogger because, well, I like it more than typepad. This post is really just to mark the occasion. I have preserved most of the articles and put new By Lines on all the old posts.

Thursday, March 24, 2011

New Guru, Stephen Dennis!

After a kind of extended hiatus, grid gurus gets restarted. I am Stephen Dennis. I work as a sales engineer for Univa, the new home of Grid Engine. As the new guru I will be posting hints, news, ideas and so forth about Grid Engine and high performance computing.

I have been working in technical computing since 1990. I have been very focused on Grid Engine for the last three years, helping many customers in semiconductors, electronics, aerospace, industrial, scientific computing and bio in North America, EMEA, and Asia. I have helped several customers migrate from Platform LSF and helped build a migration tool kit which provides an LSF experience on Grid Engine. I have worked with the core Grid Engine engineering team in Regensburg when they worked with Sun, and then Oracle, and now with Univa.

I am super excited about helping to grow Grid Engine in its new home.

Here are a couple of links to new Grid Engine resources.

Wednesday, May 6, 2009

Parsing SGE Accounting File

By Rich Wellner

Anyone managing an HPC cluster has probably wondered at some point about the overall performance and usage of his/her cluster. How many jobs were completed last month, what was the average job duration time, how long were they pending in queue, how many CPU slots did jobs require…? These are all good questions with answers buried somewhere in your DRM’s accounting files.

If you are using the Grid Engine, and assuming you have the usual “default cell” installation, the relevant file is $SGE_ROOT/default/common/accounting. The corresponding command that extracts information from this file is “qacct”. When you type something like “man qacct”, you will notice that qacct produces a summary of information for wall-clock, cpu and system time, and for different categories of such as hostname, queue-name, owner-name, etc., so that there is a good chance that information you are looking for is readily available. If, however, you happen to look for something that qacct does not provide, the accounting file is formatted for easy parsing. Each line in the file corresponds to one computing task, and there are more than different 40 accounting fields (separated by the ‘:’ character) on each line. The meaning of different fields is documented in the man pages (“man accounting”), so that getting information you need with standard UNIX tools should not be difficult at all.