Technical and Scientific Computing with Grid Engine

Wednesday, September 17, 2008

The OpenNebula Engine for Data Center Virtualization and Cloud Solutions

By Ignacio Martin Llorente

Virtualization has opened up avenues for new resource management
techniques within the data center. Probably, the most important
characteristic is its ability to dynamically shape a given hardware
infrastructure to support different services with varying workloads.
Therefore, effectively decoupling the management of the service (for
example a web server or a computing cluster) from the management of the
infrastructure (e.g. the resources allocated to each service or the
interconnection network).

A
key component in this scenario is the virtual machine manager. A VM
manager is responsible for the efficient management of the virtual
infrastructure as a whole, by providing basic functionality for the
deployment, control and monitoring of VMs on a distributed pool of
resources. Usually, these VM managers also offer high availability
capabilities and scheduling policies for VM placement and physical
resource selection. Taking advantage of the underlying virtualization
technologies and according to a set of predefined policies, the VM
manager is able to adapt the physical infrastructure to the services it
supports and their current load. This adaptation usually involves the
deployment of new VMs or the migration of running VMs to optimize their
placement.

The dsa-research group
at the Universidad Complutense de Madrid has released under the terms
of the Apache License, Version 2.0, the first stable version of the OpenNebula Virtual Infrastructure Engine.
OpenNebula enables the dynamic allocation of virtual machines on a pool
of physical resources, so extending the benefits of existing
virtualization platforms from a single physical resource to a pool of
resources, decoupling the server not only from the physical
infrastructure but also from the physical location. OpenNebula is a
component being enhanced within the context of the RESERVOIR European Project.

The new VM manger differentiates from existing VM managers in its
highly modular and open architecture designed to meet the requirements
of cluster administrators. OpenNebula 1.0 supports Xen and KVM
virtualization platforms to provide several features and capabilities
for VM dynamic management, such as centralized management, efficient
resource management, powerful API and CLI interfaces for monitoring and
controlling VMs and physical resources, fault tolerant design... Two of
the outstanding new features are its support for advance reservation
leases and on-demand access to remote cloud provider

Support for Advance Reservation Leases

Haizea
is an open source lease management architecture that OpenNebula can use
as a scheduling backend. Haizea uses leases as a fundamental resource
provisioning abstraction, and implements those leases as virtual
machines, taking into account the overhead of using virtual machines
(e.g., deploying a disk image for a VM) when scheduling leases. Using
OpenNebula with Haizea allows resource providers to lease their
resources, using potentially complex lease terms, instead of only
allowing users to request VMs that must start immediately.

Support to Access on-Demand to Amazon EC2 resources

Recently, virtualization has also brought about a new utility
computing model, called cloud computing, for the on-demand provision of
virtualized resources as a service. The Amazon Elastic Compute Cloudi
s probably the best example of this new paradigm for the elastic
capacity providing. Thanks to virtualization, the clouds can be used
efficiently to supplement local capacity with outsourced resources. The
joint use of these two technologies, VM managers and clouds, will
change arguably the structure and economics of current data centers.
OpenNebula provides support to access Amazon EC2 resources to
supplement local resources with cloud resources to satisfy peak or
fluctuating demands.

Scale-out of Computing Clusters with OpenNebula and Amazon EC2

As use case to illustrate the new capabilities provided by OpenNebula, the release includes documentation
about the application of this new paradigm (i.e. the combination of VM
managers and cloud computing) to a computing cluster, a typical data
center service. The use of a new virtualization layer between the
computing cluster and the physical infrastructure extends the classical
benefits of VMs to the computing cluster, so providing cluster
consolidation, cluster partitioning and support for heterogeneous
workloads. Moreover, the integration of the cloud in this layer allows
the cluster to grow on-demand with additional computational resources
to satisfy peak demands.

I gnacio Martín Llorente

Reprinted from blog.dsa-research.org

Tuesday, September 16, 2008

Cloud Caucusing

By Rich Wellner

Several months ago on this blog, I mused on what was meant by the term cloud computing. At the time, it was even more difficult than it is today to get a solid definition of the concept. Since then, many opinions have been bandied about providing plenty of fuel for the debate. While I think the concept has solidified some, cloud computing remains a highly polysemous term where folks from different backgrounds have developed their own definitions based upon their particular worldviews. These viewpoints come from vendors, specialists, researchers, as well as different user communities.

Although a unified definition for cloud computing has not emerged, the concept has gained a lot of traction. I believe that this is because each interested-group has found significant promise in what they call the cloud. Of course anything with this much possibility will certainly see some hype. As I have said, before: the term invokes thoughts of transient beauty and power: even marketing folks can get excited with this one! (Compare that to SaaS).

In any event, I thought that I would give you a quick idea of the types of discussions going on around cloud computing on the internet:

Twenty Experts Define Cloud Computing;
The Next Perfect IT Storm;
Google Groups “Discussion on the-definition-of-a-cloud-of-computers”;
Cloud Computing Promise & Reality (from which we learned, “There is a clear consensus that there is no real consensus on what cloud computing is.” Bob Buderi, founder and CEO of Xconomy);
Cloud Computing Hype versus Reality;
Wiki Definition.

Compare these to one of the earliest usages of the term (search for cloud). Clearly, these documents are far from a representative set of the discussions going on out there. It just so happened that I selected a few from those I have read lately. There really is a lot going on out there.

Ultimately I expect to see many types of formalized clouds, each depending on their
operating environments and behaviors — just like I see when I look outside my
window. Once that happens, the big debates about how to interoperate between clouds of very different nature will begin. Transforming a concept into a widely accepted framework is never easy. After all, why should I have to bend my perfect cloud so that it works with yours?

So what is the upside of all this banter? It turns out that the less often a word is used, the faster it evolves. Ironically, the hype may actually force this community into consensus. As long as we keep this dialog going, we should expect a formalized cloud to come about in no time!!!

Thursday, September 4, 2008

A Cloud by Any Other Name

By Rich Wellner

The cloud list on google has been buzzing lately about the term "Enterprise Cloud" and whether it had any significance.

I had to chuckle as history started to repeat itself again between the early days of the grid and the early days of the cloud.

In our book Pawel and I wrote a section titled "How the Market Understands Grids". We didn't try to dictate terms, we tried to document the language in place at that moment in time.

In interviewing users we gathered the following terms:

Clusters -- Computers standing together, but accessible only to a small group of people
Departmental grids -- Multiple clusters accessible on a common backplane, but owned by one department
Enterprise grids -- Corporate resources available to all in the company (known today as a Enterprise Cloud)
Partner grids -- A few companies working together on big problems and sharing resources to accomplish their goals.
Open grids -- Many organizations making resources available to other members of that grid. A key distinction between an open grid and a partner grid is that an open grid doesn't typically have a key application or goal while a partner grid does.

We blanched a bit because to us grid computing meant only the last definition and we viewed those other ones as missing some key attributes that those of us who had been working in the grid field since its inception thought were really important.

We see the same thing happening today with the term cloud and particularly in the term Enterprise Cloud.

That said, is Enterprise Cloud really an oxymoron, as one person suggested?

First we have to get to definitions:

Here are the key characteristics from the cloud computing wiki:

Capital expenditure minimized and thus low barrier to entry as infrastructure is owned by the provider and does not need to be purchased for one-time or infrequent intensive computing tasks. Services are typically being available to or specifically targeting retail consumers and small businesses.
Device and location independence which enables users to access systems regardless of location or what device they are using (eg PC, mobile).
Multitenancy enabling sharing of resources (and costs) among a large pool of users, allowing for:
- Centralization of infrastructure in areas with lower costs (eg real estate, electricity)
- Peak-load capacity increases (users need not engineer for highest possible load levels)
- Utilization and efficiency improvements for systems that are often only 10-20% utilised.
Performance is monitored and consistent but can be affected by insufficient bandwidth or high network load.
Reliability by way of multiple redundant sites, which makes it suitable for business continuity and disaster recovery, however IT and business managers are able to do little when an outage hits them.
Scalability which meets changing user demands quickly, without having to engineer for peak loads. Massive scalability and large user bases are common but not an absolute requirement.
Security which typically improves due to centralization of data, increased security-focused resources, etc. but which raises concerns about loss of control over certain sensitive data. Accesses are typically logged but accessing the audit logs themselves can be difficult or impossible.
Sustainability through improved resource utilisation, more efficient systems and carbon neutrality.

None of those seem to exclude the term Enterprise Cloud.

Here's the list of attributes I compiled from the cloud google group and others IRL:

Multiple vendors accessible through open standards and not centrally
administered
Non-trivial QOS (see the gmail debate thread)
On demand provisioning
Virtualization
The ability for one company to use anothers resources (e.g. bobco
using ec2)
Discoverability across multiple administrative domains (e.g.
brokering to multiple cloud vendors)
Data storage
Per usage billing
Resource metering and basic analytics
Access to the data could me bandwidth/latency limitations, security,
Compliance – Architecture/implementation, Audit, verification
Policy based access – to data, applications and visibility
Security not only for data but also for applications

Now here we start to see some things that aren't applicable to enterprise clouds (i.e. 1, 5, 6). But the bulk of the list still works. And it's worth noting that EC2 fails on four of those things (i.e. 1, 11, 12, 13), but people don't hesitate to allow them the use of the term cloud.

In previous technology revolutions I learned the lesson (slowly) to not care so much what things are called as much as what they do (which was why, in my early writings on this group I was trying to point out to people (mostly unsuccessfully) that there are lessons to be learned from grid computing). But claiming there is a canonical definition of cloud and that enterprise cloud is a nonsense term doesn't seem accurate on the face of things. Enterprise Cloud does, however capture the essence of what many large corporate IT groups are doing/considering. Rather than telling them they shouldn't be calling it cloud/grid/enterprise cloud/managed services/SaaS/whatever, I'm taking the approach of helping them meet their business needs, with technology wearing a variety of banners, and letting them call it whatever they like.

Monday, July 21, 2008

I have a Theory

By Roderick Flores

It was with great curiosity that I read Chris Anderson's article on the end of theory. To summarize his position, the "hypothesize, model, and test" approach to science has become obsolete now that there are petabytes of information and countless numbers of computers capable of processing that data. Further, this data-tsunami has made the search for models of real-world phenomena pointless because, "correlation is enough."

The first thing that struck me as ironic about this argument is that statistical correlation is itself a model including all of its associated simplified and assumptive baggage. Just how do I assign a measure of similarity between a set of objects without having a mathematical representation (i.e. a model) of those things? How might I handle strong negative-correlation in this analysis? What about the null hypothesis? While not interesting, per se, it is useful information. Will a particular measurement be allowed to correlate with more than a single result-cluster?

Additionally, we must decide how to relate these petabytes of measurements into correlated-clusters. As before, the statistics that are used to calculate correlation are also models. Are we considering Gaussian distributions, scale-invariant power-laws, or perhaps a state-driven sense of probability? Are we talking about events that have a given likelihood such as the toss of a coin or, more likely, subjective plausibility? You need to be very cautious when choosing your statistical model. For example, using a bell-curve to describe unbounded-data destroys any real sense of correlation.

Regardless of how you statistically model your measurements, you must understand your data lest your correlations may not make sense. For example, imagine that I have two acoustic time-series. How do I measure the correlation of these two recordings to determine how well the are related? The standard approach is to simply convolve the two signals and look for a value that indicates “significant correlation”, whatever your model for that turns out to be. Yet this doesn't mean much unless I understand my data. Were each of these time-series recorded at the same sampling rate? For example, if I have 20 samples of a 10Hz sine-wave recorded at 100 samples per second it will appear exactly the same as 20 samples of a 5Hz sine-wave recorded at 50 samples per second. If I naively plot the samples, they will correlate perfectly. Basically, if I don't understand my data, I can easily erroneously report that the correlation of the two signals is perfect when in fact they have zero correlation.

Finally, what I find most intriguing is the presumption that the successful correlation of petabytes of data culled web-pages and the associated viewing habits data somehow generalizes into a method for science in general. Unlike the “as-seen on TV” products I see in infomercials, statistical inference is not the only tool that I will ever need. Restricting ourselves to correlation removes one of the most powerful tools we have: prediction. Without it, scientific discovery would be hobbled.

Consider, the correlation of all of the observed information regarding plate-boundary movement (through some model of the earth) along a fault such as the San Andreas. Keep in mind that enormous amounts of data are collected in this region. Anyway, quiet areas along the fault would either imply that a particular piece of the fault were no longer seismically-active or, using anti-correlation, that the “slip deficit” suggested that a much larger earthquake was more likely to occur in the future for that zone (These areas are referred to as seismic gaps). Moreover, the Parkfield segment of the San Andreas fault has large earthquakes approximately every twenty years. A correlative model would suggest that the entire plate-boundary should be similar which is simply not true as proven by the Anza Seismic Gap. Furthermore, correlation would also have implied that another large event should have occurred along the Parkfield Gap in the late 80s. If science were only concerned with correlation, one instrument in this zone would have been sufficient. However, the diverse set of predictions made by researchers demanded a wide variety of experiments. Consequently, this zone became the most heavily instrumented area in the world in an effort to extensively study the expected large event. They had to wait for over fifteen years for this to happen. Then there are events that few would have predicted (Black Swans) such as “slow” earthquakes which require special instrumentation to capture. These phenomena, until recently, were not able to be correlated with anything and thus, never would have existed. In fact, one of the first observations of these events was attributed to instrument error.

Clearly correlation is but one approach to modeling processes amongst many. I have a theory that we in the grid community can expect to help scientists solve many different types of theoretical problems for a good long time. Now to test...

Monday, June 30, 2008

About Grid Engine Advance Reservations

By Sinisa Veseli

Advance reservation (AR) capability is one of the most important new features of the upcoming Grid Engine 6.2 release. New command line utilities allow users and administrators to submit resource reservations (qrsub), view granted reservations (qrstat), or delete reservations (qrdel). Also, some of the existing commands are getting new switches. For example, the “-ar <AR id>“ option for qsub indicates that the submitted job is a part of an existing advanced reservation. Given that AR is a new functionality, I thought that it might be useful to describe how it works on a simple example (using 6.2 Beta software).

Advanced resource reservations can be submitted to Grid Engine by queue operators and managers, and also by a designated set of privileged users. Those users are defined in ACL “arusers”, which by default looks as follows:

$ qconf -sul
arusers
deadlineusers
defaultdepartment

$ qconf -su arusers
name    arusers
type    ACL
fshare  0
oticket 0
entries NONE

The “arusers” ACL can be modified via the “qconf -mu” command:

$ qconf -mu arusers
veseli@tolkien.ps.uud.com modified "arusers" in userset list

$ qconf -su arusers
name    arusers
type    ACL
fshare  0
oticket 0
entries veseli

Once designated as a member of this list, the user is allowed to submit ARs to Grid Engine:

[veseli@tolkien]$ qrsub -e 0805141450.33 -pe mpi 2
Your advance reservation 3 has been granted

[veseli@tolkien]$ qrstat
ar-id   name       owner        state start at             end at               duration
-----------------------------------------------------------------------------------------
      3            veseli       r     05/14/2008 14:33:08  05/14/2008 14:50:33  00:17:25

[veseli@tolkien]$ qstat -f 
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/0/4          0.04     lx24-x86

For the sake of simplicity, in the above example we have a single queue (all.q) that has 4 job slots and a parallel environment (PE) mpi assigned to it. After reserving 2 slots for the mpi PE, there are only 2 slots left for running regular jobs until the above shown AR expires. Note that the "–e" switch for qrsub designates requested reservation end time in the format YYMMDDhhmm.ss. It is also worth pointing out that the qstat output changed slightly with respect to previous software releases in order to accommodate display of existing reservations.

If we now submit several regular jobs, only 2 of them will be able to run:

[veseli@tolkien]$ qsub regular_job.sh 
Your job 15 ("regular_job.sh") has been submitted
...
[veseli@tolkien]$ qsub regular_job.sh 
Your job 19 ("regular_job.sh") has been submitted

[veseli@tolkien]$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/2/4          0.03     lx24-x86      
     15 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     16 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     17 0.55500 regular_jo veseli       qw    05/14/2008 14:34:22     1        
     18 0.55500 regular_jo veseli       qw    05/14/2008 14:34:23     1        
     19 0.55500 regular_jo veseli       qw    05/14/2008 14:34:24     1

However, if we submit jobs that are part of the existing AR, those are allowed to run, while jobs submitted earlier are still pending:

[veseli@tolkien]$ qsub -ar 3 reserved_job.sh 
Your job 20 ("reserved_job.sh") has been submitted
[veseli@tolkien]$ qsub -ar 3 reserved_job.sh 
Your job 21 ("reserved_job.sh") has been submitted

[veseli@tolkien]$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/4/4          0.02     lx24-x86      
     15 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     16 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     20 0.55500 reserved_j veseli       r     05/14/2008 14:35:02     1        
     21 0.55500 reserved_j veseli       r     05/14/2008 14:35:02     1

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     17 0.55500 regular_jo veseli       qw    05/14/2008 14:34:22     1        
     18 0.55500 regular_jo veseli       qw    05/14/2008 14:34:23     1        
     19 0.55500 regular_jo veseli       qw    05/14/2008 14:34:24     1

The above example illustrates how ARs work. As long as particular reservation is valid, only jobs that are designated as part of it can utilize resources that have been reserved.

I think that AR will prove to be extremely valuable tool for planning grid resource usage, and I’m very pleased to see it in the new Grid Engine release.

Friday, June 6, 2008

Steaming Java

By Roderick Flores

When Rich asked us to walk through a software development process, I immediately
thought back to a conversation that I had with my friend Leif Wickland about building high-performance Java applications. So I immediately emailed him asking him for his best practices. We have both produced code that is as fast, if not faster than C compiled with optimization (for me it was using a 64-bit JRE on a x86_64 architecture with multiple cores).

That is not to say that if you were to spend time optimizing the equivalent C-code that it would not be made to go faster. Rather, the main point is that Java is a viable HPC language. On a related note, Brian Goetz of Sun has a
very interesting discussion on IBM's DeveloperWorks, Urban performance legends, revisited on how garbage collection allows faster raw allocation performance.

However I digress… Here is a summary of what we both came up with (in no
particular order):

It is vitally important to "measure, measure, measure," everything
you do. We can offer any set of helpful hints but the likelihood that all of them should be applied is extremely low.
It is equally important to remember to only optimize areas in the program that are bottlenecks. It is a waste of development time for no real gain.
One of the most simple and overlooked things that help your application is to overtly specify method parameters that are read-only using the final modifier. Not only can it help the compiler with optimization but it also
is a good way of communicating your intentions to your teammates. Furthermore, i
f you can make your method parameters final, this will help even more. One thing
to be aware of is that not all things that are declared final behave as expected (see Is that your final answer? for more detail).
If you have states shared between threads, make whatever you can final so that that the VM takes no steps to ensure consistency. This is not
something that we would have expected to make a difference, but it seems to help.
An equally ignored practice is using the finally clause. It i
s very important to clean up the code in a try block. You could leave open streams, SQL queries, or perhaps other objects lying around taking up space.
Create your data structures and declare your variables early. A core goal is to avoid allocating short-lived variables. While it is true that the garbage collector may reserve memory for variables that are declared often, why make it have to try to guess your intentions. For example, if a loop is called repeatedly, there is no need to say, for (int i = 0; …
when you should have declared i earlier. Of course you have to be careful
not to reset counters from inside of loops.
Use static for values that are constants. This may seem obvious, but not everybody does.
For loops embedded within other loops:
- Replace your outer loop with fixed-pool of threads.
  In the next release of java, this will be even easier using the fork-join keywords. This has become increasingly important with processors with many cores.
- Make sure that your innermost loop is the longest even if it doesn't necessarily map directly to the business goals. You shouldn't
  force the program to create a new loop too often as it wastes cycles.
- Unroll your inner-loops. This can save an enormous amount of time even if it isn't pretty. The quick test I just ran was 300% faster. If you haven'
  t unrolled a loop before, it is pretty simple:
  
  
        unrollRemainder = count%LOOP_UNROLL_COUNT;
  
  
  
        for( n = 0; n < unrollRemainder; n++ ) {
  
           // do some stuff here.
  
        }
  
  
  
        for( n = unrollRemainder; n < count; n+=LOOP_UNROLL_COUNT ) {
  
           // do stuff for n here
  
           // do stuff for n+1 here
  
           // do stuff for n+2 here
  
           …
  
           // do stuff for n+LOOP_UNROLL_COUNT - 1 here
  
        }
  
        Notice that both n and unrollRemainder were declared earlier as recommended previously.
Preload all of your input data and then operate on it later. There
is absolutely no reason that you should be loading data of any kind inside of your main calculation code. If the data doesn't fit or belong on one machine, use
a Map-Reduce approach to distribute it across the Grid.
Use the factory pattern to create objects.
- Data structures can be created ahead of time and only the necessary pieces are passed to the new object.
- Any preloaded data can also be segmented so that only the necessary parts are passed to the new object.
- You can avoid the allocation of short-lived variables by using constructors with the final keyword on its parameters.
- The factory can perform some heuristic calculations
  to see if a particular object should even be created for future processing.
When doing calculations on a large number of floating-point values,
use a byte array to store the data and a ByteWrapper to convert it to floats. This should primarily be used for read only (input) data. If you are writing floating-point values you should do this with caution as it may take
more time than using a float array. One major advantage that Java has when you use this approach is that you can switch between big and little-endian data rather easily.
Pass fewer parameters to methods. This results in less overhead. If
you can pass a static value it will pass one fewer parameter.
Use static methods if possible. For example, a FahrenheitToCelsius(float fahrenheit); method could easily be made static. The main advantage
here is that the compiler will likely inline the function.
There is some debate whether you should make particular methods
final if they are called often. There is a strong argument to not do this because the enhancement is small or nonexistent (see Urban Performance Legends or
once again Is that your final answer?). However my experience is that a small enhancement on a calculation that is run thousands of times can make a significant difference. Both Leif and I have seen measurable differences here. The key is to benchmark your code to be certain.

Wednesday, May 14, 2008

Grid Engine 6.2 Beta Release

By Sinisa Veseli

Grid Engine 6.2 will come with some interesting new features. In addition to advance resource reservations and array job interdependencies, this release will also contain a new Service Domain Manager (SDM) module, which will allow distributing computational resources between different services, such as different Grid Engine clusters or application servers. For example, SDM will be able to withdraw unneeded machines from one cluster (or application server) and assign it to a different one or keep it in its “spare resource pool”.

It is also worth mentioning that Grid Engine (and SDM) documentation is moving to Sun’s wiki.
The 6.2 beta release is available for download here.