Technical and Scientific Computing with Grid Engine

Friday, February 22, 2008

How to Rank High Throughput Computing Enviroments

By Ignacio Martin Llorente

TOP500 lists computers ranked by their performance on the LINPACK Benchmark. It is clear that no single number can reflect the performance of a computer. Linpack is, however, a representative benchmark to evaluate computing platforms as High Performance Computing (HPC) environments, that is in the dedicated execution of a single tightly coupled parallel application. On the other hand, an HTC application comprises the execution of a set of independent tasks, each of which usually performs the same calculation over a subset of parameter values. Although, the HTC model is widely used in Science, Engineering and Business, there is not representative bechmark and model to evaluate the performance of computing platforms as HTC environments. At first sight, it could be agued that there is no need for such a performance model. We agree on this for static and homogeneous systems. However, how can we evaluate a system consisting of heterogeneous and/or dynamic components?.

Benchmarking of Grid infrastructures has always been a highly polemic area. The heterogeneity of the components and the high number of layers in the middleware stack make difficult even to define the aim and scope of the benchmark. A couple of years ago we wrote a paper entitled "Benchmarking of High Throughput Computing Applications on Grids" (R. S. Montero, E. Huedo and I. M. Llorente) for the Parallel Computing Journal presenting a pragmatic approach to evaluate the performance of a Grid infrastructure when running High Throughput Computing (HTC) applications. We demonstrated that the complexity of a whole Grid infrastructure can be represented by only two performance parameters, which can be used to compare infrastructures. The proposed performance model is independent from the middleware stack and valid for any computing infrastructure, so being also applicable for the evaluation of clusters and HPC servers.

The Performance Model

Our proposal is to follow an approach similar to that used by Hockney and Jesshope to characterize the performance of homogeneous array architectures on vector computations. A first-order description of a Grid can be made by using the following formula for the number of tasks completed as a function of time:

n(t)=R*t-N

Note that given the heterogeneous nature of a Grid, the execution time of each task can differ greatly. So the following analysis is valid for general HTC applications, where each task may require distinct instruction streams. The coefficients of the line are called:

Asymptotic performance (R): the maximum rate of performance in tasks executed per second. In the case of an homogeneous array of P processors with an execution time per task T, we have R = P/T.
Half-performance length (N): the number of task required to obtain the half of the asymptotic performance. This parameter is also a measure of the amount of parallelism in the system as seen by the application. In the homogeneous case, for an embarrassingly distributed application we obtain N = P/2.

The above linear relation can be used to define the performance of the system (tasks completed per second) on actual applications with a finite number of tasks:

r(n)=R/(1+N/n)

Interpretation of the Parameters of the Model

This linear model can be interpreted as an idealized representation of a heterogeneous Grid, equivalent to an homogeneous array of 2N processors with an execution time per task 2* N/R.

The half-performance length (N), on the other hand, provides a quantitative measure of the heterogeneity in a Grid. This result can be understood as follows, faster processors contribute in a higher degree to the performance obtained by the system. Therefore the apparent number of processors (2N), from the application's point of view, will be in general lower than the total processors in the Grid (P). We can define the degree of heterogeneity (m) as 2N/P. This parameter varies form m = 1 in the homogeneous case, to m = 0 when the actual number of processors in the Grid is much greater than the apparent number of processors (highly heterogeneous).

N is an useful characterization parameter for Grid infrastructures in the execution of HTC applications. For example, let us consider two different Grids with a similar asymptotic performance. In this case, by analogy with the homogeneous array, a lower N parameter reflects a better performance (in terms of wall time) per Grid resource, since the same performance (in terms of throughput) is delivered by a smaller ‘‘number of processors''.

The Benchmark

We propose the OGF DRMAA implementation of the ED benchmark in the NAS Grid Benchmark suite, with an appropriate scaling to stress the computational capabilities of the infrastructure, as benchmark to apply the performance model. The ED benchmark comprises the execution of several independent tasks. Each one consists in the execution of the SP flow solver with a different initialization parameter for the flow field. These kind of HTC applications can be directly expressed with the DRMAA interface as bulk jobs.

DRMAA represents a suitable and portable API to express distributed communicating jobs, like the NGB. In this sense, the use of standard interfaces allows the comparison between different Grid implementations, since neither NGB nor DRMAA are tied to any specific Grid infrastructure, middleware or tool. DRMAA is implemented with the following available Resource Manager systems: Condor, LSF, Globus GridWay, Grid Engine and PBS.

In the paper we present both an intrusive and a non-intrusive methods to obtain the performance parameters. The light-weight non-intrusive probes provide continual information on the health of the Grid environment, and so a way to measure the dynamic capacity of the Grid, which could eventually be used to generate global meta-scheduler strategies.

An Invitation to Action

We have demonstrated in several publications how the first-order model reflects performance of complex infrastructures running HTC applications. So, why don't we create a TOP500-like ranking of infrastructures?. The ranking could be dynamic, obtaining the parameters with the non-intrusive probes. We have all the ingredients:

A model representing the achieved performance by using only two parameters: asymptotic performance (R) and half-performance length (N)
A benchmark representative of HTC applications: embarrassingly distributed test included in the NAS Grid Benchmark suite
A standard to express the benchmark: OGF DRMAA

Ignacio Martín Llorente
Reprinted from blog.dsa-research.org

Monday, February 18, 2008

How to Monitor Grid Engine

By Sinisa Veseli

You have built and installed your shiny new cluster, installed the Grid Engine software, configured the queues, and announced to the world that your new system is ready to be used. What next? Well, think about your monitoring options…

As users start submitting jobs and hammering the system in every possible way, things will inevitably break on occasion. When something goes wrong in the system, you will want to know about the problem before you start receiving help desk calls and user emails.

The first step in developing an effective strategy for monitoring Grid Engine is learning how to use the available command line tools and how to look for possible issues in the system. Some of the things that you should always pay attention to include:

• queues in the unknown state; instance queue in an unknown state usually means that execution daemon is down on that particular host

• queues and jobs in the error state

• configuration inconsistencies

• load alarms

All of the above information can be easily obtained using the qstat command (e.g., try something like “qstat -f -qs uaAcE -explain aAcE”). It is also not difficult to script basic GE monitoring tasks and come up with a simple infrastructure that is able to alert system administrators to any new or outstanding problems in the system.

As your user base grows, so will your monitoring needs, and you will likely want to extend your monitoring tools. You should consider looking into existing software packages like xml-qstat, which uses XSLT transformations to render Grid Engine command line XML output into different output formats. Alternatively, you can also develop set of your own XSL stylesheets that are customized to your needs, and use widely available command line tools such as xsltproc to generate monitoring web pages from the “qstat -xml” output.

Another interesting Grid Engine monitoring option is the Monitoring Console that comes with Cluster Express (CE). Its main advantage is that it integrates monitoring data from several different sources: Ganglia (system data), Grid Engine Qmaster and ARCo database (job data). However, even though the Cluster Express by itself is easy to install, at the moment integrating the CE Monitoring Console with existing Grid Engine installation requires a little bit of work. I am told that this will be much simplified in the upcoming CE release. In the meantime, if you are really anxious to try the CE Monitoring GUI on your Grid Engine cluster, do not hesitate to send me an email…

Thursday, February 14, 2008

My Head in the Clouds

By Rich Wellner

Up until I read Ian Foster’s Cloud Computing post, I had paid little attention to what the term meant to people. Personally, I had already chalked up the idea as a rebranding of Grid computing. So I asked a number of friends what they thought the differences between the two were. Of course many people not actively involved in the community are not familiar with either concept. (I find the fact that computer professionals know what the latest buzz is around SaaS, and SOA but do not seem to consider how and where they might land these systems peculiar.) In any event, here is a summary of the answers I received:

Grid is characterized by more formalized computing arrangements between user and vendor whereas a Cloud is more for ad hoc resource utilization;
The types of computation on a Grid are typically parallel in nature whereas Clouds are for more simple calculations;
Grid was usurped by vendors to indicate that their services were distributed for better performance and reliability while Cloud had become the term for a generalized set of distributed resources;
Clouds are ethereal – anybody who watches them as they cross the sky knows that…

While I found it rather interesting that while there was some overlap of technical perspectives, I did not get any answers that were identical. I believe that the last description explains this situation nicely while also offering the most interesting take on the topic. A Cloud is a nuanced term that invokes the idea of something beautiful which also evolves rapidly, contains a lot of power, and then is gone. Meanwhile the grid, like the utilities it was conceived from, is known for its reliability, ability to tap into reserve power on a moments notice, as well as their accommodating levels of service. Notice how the first two answers all adhere to this concept? I don’t think this is an accident nor do I think that this escaped the attention of the marketing departments of industry-leaders like Amazon and Google, both of whom operate in what they term Cloud space. While it is distinctly possible that the term organically evolved, it is interesting that they chose to stick with it.

Once more, I found it particularly noteworthy that not one person I queried mentioned the amount of data to be processed. Foster and the Business Week article he references, as well as many others, suggest that we need to think in terms of a great deal more data than we have before. For example, Google wants their people to think in terms of a thousand times more data than that to which they are accustomed.

Heck, I was thinking about writing about the so-called “Data Tsunami” myself – but not in terms of thinking about significantly larger datasets. The datasets we were working with a decade ago were suitably massive for what we were trying to accomplish. Like today, it was not economically feasible to keep it all online at once. The fact is that the incredible leaps in computational capacity have led us to build more complicated problems that demand still more data. As such, a thousand times more data is probably still not enough. If only the networking and storage companies had kept up with the leaps in processing capacity (I was on a gigabit network five years ago and I am using a gigabit network today >sigh<).

Consequently we still have to use the tried and true standard operating procedure of:

First carefully selecting a fraction of the data available for storage (i.e. triggering);
Next this rough dataset is pruned further by pre-processing to find the most interesting records;
Finally detailed analysis is performed on what is computationally feasible.

For example, the Large Hadron Collider will be examining billions of collisions per second but will only store a few hundred per second for later processing (see recent article on Dr. Heuer). We have always needed to think in terms of a thousand times more data than we can possibly process or to become accustomed. Basically the scope of what is economically feasible has changed dramatically over the last few decades, while we continue to be quite resource constrained. Which brings us back to the concept of capacity computing, whether in the form of a transitory Cloud, a steadfast Grid, or even the comfortable @home project. The key here is that people are continuing to push passed the boundaries of what is feasible.

Wednesday, February 13, 2008

The Hot Trend for 2008

By Rich Wellner

Fortune 400 businesses, and many smaller ones, run clusters in many different locations around the world.

As facility, management and other costs continue to become larger and large shares of corporate IT budgets, networking costs continue to fall. The result is that data center consolidation becomes a more reasonable goal.

I've seen this in a few of my customers. Beginning last year people started looking more toward grid technology to help them manage this. As the economy has tightened more people have considered this. Particularly as part of a plan toward cost reduction by moving to open source tools.

The general pattern is that the IT group decides they need to find ways to more effectively manage large and disconnected sets of resources. They turn to grid computing to help them manage that cloud and in the process realize that they have a lot of special purpose machines that are being quite underutilized and that they have enormous duplication of effort in the management of those data centers.

As we've entered into a bear market, many companies are taking a second look at their IT costs and looking for ways to tighten their belts. The combination of open source and grid/cloud computing models offers the ability to do that with open source offering a lower cost software acquisition model and grid computing allowing reduction in IT staff through centralization.

I've also been working with folks on the lost art of environment management. But more on that in a future blog...

Friday, February 8, 2008

How to Build Utility Computing Infrastructures with Globus

This is a guest post by Ignacio Martín Llorente, Professor of Distributed Systems Architectures at Universidad Complutense de Madrid.

While research institutions are interested in Partner Grids that provide access to a higher computing performance to satisfy peak demands and support to face collaborative projects; enterprises understand grid computing as a way to address the changing service needs in an organization. They are interested in in-house resource sharing, to achieve a better return from their information technology investment, supplemented by outsourced resources, to satisfy peak or unusual demands. An Outsourced/Utility Grid would provide pay-per-use computational power when Enterprise Grid resources are overloaded. Such hierarchical grid organization may be extended recursively to federate a higher number of Partner or Outsourced Grid infrastructures with consumer/provider relationships. This would allow supplying resources on demand, making resource provision more agile and adaptive. It would offer, therefore, access to a potentially unlimited computational capacity, causing IT costs to transform from fixed to variable.

In the context of the GridWay project we have developed a Grid Gateway that exposes a WSRF interface to a metascheduling instance, so enabling the creation of hierarchical grid structures. GridGateWay consists of a set of Globus services hosting a GridWay Metascheduler, thus providing a uniform, standard interface for the secure and reliable submission, monitoring and control of jobs. Most functionality is provided through GRAM (Grid Resource Allocation and Management), while scheduling information is provided through MDS (Monitoring and Discovery Service). The security requirement at the user level is addressed by GSI (Globus Security Infrastructure).

The new technology allows different layers of metaschedulers to be arranged in a hierarchical structure. In this arrangement, each target grid is handled as another resource, that is, the underlying grid is characterized as a single resource in the source grid, by means of grid gateways. This strategy encourages companies to federate their grids in order to have a better return of IT investment, and also satisfy peak demands of computation. Furthermore, this solution allows for gradual deployment (from fully in-house to fully outsourced), in order to deal with the obstacles for grid technology adoption, such as enterprise scepticism and IT staff resistance.

This approach also provides the components required for interoperability between existing Grid infrastructures. It is clear that we can’t wait for a single global grid to arise or to become predominant. Instead, we should work to build a seamless integration of the existing grids, which may eventually constitute the ultimate, capital-letter Grid, Grid of grids, or InterGrid, in the same way that the Internet was born. Grid interoperability can be achieved by means of common, ideally standard, grid interfaces, whose existence is an important (if not essential) characteristic of grid systems. Unfortunately, common interfaces (and even less standard ones) are not always available for given services. Then, the use of grid adapters and gateways becomes necessary. In particular, an interoperability solution based on grid gateways provides the infrastructures with significant benefits in terms of autonomy, scalability, deployment and security.

Well, what are you waiting for?, components are open-source, license is Apache v2.0, and we are willing to collaborate with you.

Wednesday, February 6, 2008

Six, No, Eight!, Reasons to Deploy Virtualization

By Rich Wellner

Dan Kusnetzky has a good article about why people use virtualization. He distills things down to the following six goals:

Higher Performance
Increased Scalability
Increased Reliability or Availability
Workload Consolidation
Application Agility
Unified Management Domain

I'd probably add a couple that have some overlap, but are worth calling out independently:

Reduction of Vendor Lock-in
Disaster Recovery

Tuesday, February 5, 2008

How To Write Your Own Load Sensors For Grid Engine

By Sinisa Veseli

As most Grid Engine (GE) administrators know, the GE execution daemon periodically reports values for a number of host load parameters. Those values are stored in the qmaster internal host object, and are used internally (e.g., for job scheduling) if a complex resource attribute with a corresponding name is defined.

Parameters that are reported by default (about 20 or so) are documented in the file $SGE_ROOT/doc/load_parameters.asc. For a large number of clusters the default set is sufficient to adequately describe their load situation. However, for those sites where this is not the case, the Grid Engine software provides administrators with the ability to introduce additional custom load parameters. Accomplishing this task is not difficult, and involves three steps:

1) Provide custom load sensor. This can be a script or a binary that feeds the GE execution daemon with additional load information. It must comply with the following rules:

• It should be written as an infinite loop that waits for user input from the standard input stream.

• If the string “quit” is received, the sensor should exit.

• Otherwise, it should retrieve data necessary for computing the desired load figures, calculate those, and write them to the standard output stream.

• The individual host-related load figures should be reported one per line and in the form “::” (without any blanks). The load figures should be enclosed with a pair of lines containing only “begin” and “end” strings. For example, custom load sensor running on the machine tolkien.univaud.com and measuring parameters n_app_users and n_app_threads might show the following output:

begin

tolkien.univaud.com:n_app_users:12

tolkien.univaud.com:n_app_threads:23

end

Note that for global consumable resources not attached to each host (such as, for example, the number of used floating licenses), the load sensor needs to output string “global” instead of the machine name.

2) For each custom load parameter define complex resource attribute using, for example, the “qconf -mc” command.

3) Enable custom load sensor by executing the “qconf -mconf” command and providing the full path to your script or executable as value for the “load_sensor” parameter. If all goes well, the execution daemon will start reporting the new load parameters within a minute or two, and you should be able to see them using the “qhost -F ” command.

Administrators with decent scripting skills (or those with a bit of luck ☺) can usually implement and enable new load sensors for their Grid Engine installations in a very short period of time. Note that some simple examples for custom load sensors can be found in the Grid Engine Admin Guide, as well as in the corresponding HowTo document.