Technical and Scientific Computing with Grid Engine: 2008

Monday, December 8, 2008

The Private Clouds

By Ignacio Martin Llorente

Last month I was invited to give a couple of talks about Cloud computing in the wonderful C3RS (Cisco Cloud Computing Research Symposium). The slides are available online, if you want to check. Although the audiences were quite heterogeneous, there is a recurrent question among the participants of these events: How can I set my private cloud?. Let me briefly summarize the motivation of the people asking this:

Lease compute capacity from the local infrastructure. These people acknowledge the benefits of virtualizing their own infrastructure as a whole. However, they are not interested, in selling this capacity over the internet, or at least is not a priority for them. This is, they do not want to become a EC2 competitor, so they do not need to expose to the world a cloud interface.
Capacity in the cloud. They do not want to be the new EC2 but they want to use EC2. The ability of moving some services, or part of the capacity of a service, to an external provider is very attractive to them.
Open Source. Current cloud solutions are proprietary and closed, they need an open source solution to play with. Also, they are using some virtualization technologies that would like to see integrated in the final solution.

I said to these people, take a look to OpenNebula. OpenNebula is a distributed virtual machine manager that allows you to virtualize your infrastructure. It also features an integral management of your virtual services, including networking and image management. Additionally, it is shipped with EC2 plug-ins that allow you to simultaneously deploy virtual machines in your local infrastructure and in Amazon EC2.

OpenNebula is modular-by-design to allow its integration with any other tool, like the Haziea lease manager, or Nimbus that gives you a EC2 compatible interface in case you need one. It is a healthy open source software being improved in several projects like RESERVOIR, and it has a growing community.

Go here if you want to set up your private cloud!

Ruben S. Montero

Reprinted from blog.dsa-research.org

Friday, December 5, 2008

Using Grid Engine Subordinate Queues

By Sinisa Veseli

It is often the case that certain classes of computing jobs have high priority and require immediate access to cluster resources in order to complete on time. For such jobs one could reserve a special queue with assigned set of nodes, but this solution might waste resources when there are no high priority jobs running. Alternative approach to this problem in Grid Engine involves using subordinate queues, which allow preemption of job slots. The way this works is as follows:

A higher priority queue is configured with one or more subordinates queues
Jobs running in subordinated queues are suspended when the higher priority queue becomes busy, and they are resumed when the higher priority queue is not busy any longer
For any subordinate queue one can configure number of job slots (“Max Slots” parameter in the qmon “Subordinates” tab for the higher priority queue configuration) tab that must be filled in the higher priority queue to trigger a suspension. If “max slots” is not specified then all job slots must be filled in the higher priority queue to trigger suspension of the subordinate queue.

In order to illustrate this, on my test machine I’ve setup three queues (“low”, “medium” and “high”) intended to run jobs with different priorities. The “high” queue has both “low” and “medium” queues as subordinates, while the “medium” queue has “low” as its subordinate:

user@sgetest> qconf -sq high | grep subordinate
subordinate_list      low medium
user@sgetest> qconf -sq medium | grep subordinate
subordinate_list      low
user@sgetest> qconf -sq low | grep subordinate
subordinate_list      NONE

After submitting a low priority array job to the “low” queue, qstat returns the following information:

user@sgetest> qsub -t 1-10 -q low low_priority_job.sh
Your job-array 19.1-10:1 ("low_priority_job.sh") has been submitted
user@sgetest> qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high@sgetest.univaud.com    BIP   0/2       0.07     lx24-amd64
----------------------------------------------------------------------------
medium@sgetest.univaud.com    BIP   0/2       0.07     lx24-amd64
----------------------------------------------------------------------------
low@sgetest.univaud.com    BIP   2/2       0.07     lx24-amd64
19 0.55500 low_priori user       r     11/24/2008 17:05:02     1 1
19 0.55500 low_priori user       r     11/24/2008 17:05:02     1 2

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
19 0.55500 low_priori user       qw    11/24/2008 17:04:50     1 3-10:1

Note that all available job slots on my test machine are full. Submission of the medium priority array job to the “medium” queue results in suspension of the previously running low priority tasks (this is indicated by the letter “S” next to the task listing in the qstat output):

user@sgetest> qsub -t 1-10 -q medium medium_priority_job.sh
Your job-array 20.1-10:1 ("medium_priority_job.sh") has been submitted
user@sgetest> qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high@sgetest.univaud.com    BIP   0/2       0.06     lx24-amd64
----------------------------------------------------------------------------
medium@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64
20 0.55500 medium_pri user       r     11/24/2008 17:05:17     1 1
20 0.55500 medium_pri user       r     11/24/2008 17:05:17     1 2
----------------------------------------------------------------------------
low@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64    S
19 0.55500 low_priori user       S     11/24/2008 17:05:02     1 1
19 0.55500 low_priori user       S     11/24/2008 17:05:02     1 2

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
19 0.55500 low_priori user       qw    11/24/2008 17:04:50     1 3-10:1
20 0.55500 medium_pri user       qw    11/24/2008 17:05:15     1 3-10:1

Finally, submission of a high priority array job to the “high” queue results in previously running medium priority tasks to be suspended:

user@sgetest> qsub -t 1-10 -q high high_priority_job.sh
Your job-array 21.1-10:1 ("high_priority_job.sh") has been submitted
user@sgetest> qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
high@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64
21 0.55500 high_prior user       r     11/24/2008 17:06:02     1 1
21 0.55500 high_prior user       r     11/24/2008 17:06:02     1 2
----------------------------------------------------------------------------
medium@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64    S
20 0.55500 medium_pri user       S     11/24/2008 17:05:17     1 1
20 0.55500 medium_pri user       S     11/24/2008 17:05:17     1 2
----------------------------------------------------------------------------
low@sgetest.univaud.com    BIP   2/2       0.06     lx24-amd64    S
19 0.55500 low_priori user       S     11/24/2008 17:05:02     1 1
19 0.55500 low_priori user       S     11/24/2008 17:05:02     1 2

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
19 0.55500 low_priori user       qw    11/24/2008 17:04:50     1 3-10:1
20 0.55500 medium_pri user       qw    11/24/2008 17:05:15     1 3-10:1
21 0.00000 high_prior user       qw    11/24/2008 17:05:52     1 3-10:1

Medium priority tasks will be resumed after all high priority tasks are done, and low priority tasks will run after medium priority job is finished.

One thing worthy of pointing out is that Grid Engine queue subordination is implemented on the “instance queue” level. In other words, if I had machine "A" associated with my queue “low”, but not with queues “high” or “medium”, jobs running on machine "A" would not be suspended even if there were higher priority jobs waiting to be scheduled.

Tuesday, December 2, 2008

Managing Resource Quotas in Grid Engine

By Sinisa Veseli

It is often the case that cluster administrators must impose limits on using certain resources. Good example here would be preventing a particular user (or a set of users), from utilizing entire queue (or cluster) at any point. If you’ve ever tried doing something like that for Grid Engine (SGE), then you know that it is not immediately obvious how to impose limits on resource usage.

SGE has a concept of “resource quota sets” (RQS), which can be used to limit maximum resource consumption by any job. The relevant qconf command line switches for manipulating resource quota sets are “-srqs” and “-srqsl” (show), “-arqs” (add), “-mrqs” (modify) and “-drqs” (delete).

Each RQS must have the following parameters: name, description, enabled and limit. RQS name cannot have spaces, but its description can be an arbitrary string. The boolean “enabled” flag specifies whether the RQS is enabled or not, while the “limit” field denotes resource quota rule that consists of an optional name, filters for a specific job request and the resource quota limit. Note that one can have multiple “limit” fields associated with a given RQS. For example, the following RQS prevents user “ahogger” to occupy more than 1 job slot in general, and it also limits the same user from running jobs in the headnodes.q queue:

$ qconf -srqs ahogger_job_limit
{
name         ahogger_job_limit
description  "limit ahogger jobs"
enabled      TRUE
limit        users ahogger to slots=1
limit        users {ahogger} queues {headnodes.q} to slots=0
}

The exact format in which RQS have to be specified is, like everything else, well documented in SGE man pages (“man sge_resource_quota”).

Monday, November 17, 2008

Automating Grid Engine Monitoring

By Sinisa Veseli

When visiting client sites I often notice various issues with the existing distributed resource management software installations. The problems usually vary from configuration issues to queues in an error state. While things like inadequate resources and queue structure usually require more analysis and better design, problems like queues in an error state are easily detectable. So, cluster administrators, who are often busy with many other duties, should try to automate monitoring tasks as much as they can. For example, if you are using Grid Engine, you can easily come up with scripts like the one below, which looks for several different kinds of problems in your SGE installation:

#!/bin/sh

. /usr/local/unicluster/unicluster-user-env.sh

explainProblem() {
qHost=$1   # queue where the problem is found
msg=`qstat -f -q $qHost -explain aAEc | tail -1 | sed 's?-??g' | sed '/^$/d'`
echo $msg
}

checkProblem() {
description=$1  # problem description
signature=$2    # problem signature
for q in `qconf -sql`; do
cmd="qstat -f -q $q | grep $q | awk '{if(NF>5 && index(\$NF, \"$signature\")>0) print \$1}'"
qHostList=`eval $cmd`
if [ "$qHostList" != "" ]; then
for qHost in $qHostList; do
msg=`explainProblem $qHost`
echo "$description on $qHost:"
echo "  $msg"
echo ""
done
fi
done
}

echo "Grid Engine Issue Summary"
echo "========================="
echo ""
checkProblem Error E
checkProblem SuspendThreshold A
checkProblem Alarm a
checkProblem ConfigProblem c

Note that the above script should work with Unicluster Express 3.2 installed in the default (/usr/local/unicluster) location. It can be easily modified to, for example, send email to administrators in case problems are found that need attention. Although simple, such scripts usually go long way towards ensuring that your Grid Engine installation operates smoothly.

Thursday, November 6, 2008

Who Cares What's inside a Cloud?

By Roderick Flores

When I consider my microwave, telephone, or television I see fairly sophisticated applications that I simply plug into service providers and get useful results. If I choose to switch between individual service providers I can do so easily (assuming certain levels of deregulation of utility monopolies of course). Most importantly, while I understand how these appliances work, I would never want to build one myself. Yet I am not required to do so because the providers use standardized interfaces that appliance manufactures can easily offer: I buy my appliances as I might any other tool. Consequently, I can switch out the manufacturer or models for each of the services I use without interacting with the provider. I use these tools in a way that makes my work and life more efficient.

Nobody listens in on my conversations, nor do they receive services at my expense, I can use these services how I wish, and because of competition, I can expect an outstanding quality of service. At the end of the month, I get a bill from my providers for the services I used. These monetary costs are far outweighed by the convenience these services offer.

It is this sort of operational simplicity that motivated the first call for computational power as a utility in 1965. Like the electrical grid, a consumer would simply plug in their favorite application and use the compute power offered by a provider. Beginning in the 1990s, this effort centered around the concept of Grid computing.

Just like the early-days of electricity services, there were many issues with providing Grid computing. The very first offerings were proprietary or narrowly focused. The parallels with the electric industry are easily recognized. Some might provide street lighting whereas others would provide power for home lighting and still others for transportation and yet another group industrial applications. Moreover, each provider used different interfaces to get the power. Thus switching between providers, not a rare occurrence in a volatile industry, was no small undertaking. This, clearly was very costly for the consumer.

It took an entrepreneur to come to the industry and unify electrical services for all applications while also creating a standardized product (see http://www.eei.org/industry_issues/industry_overview_and_statistics/history for a quick overview). Similarly several visionaries had to step in and define what a Grid computer needed to do in order to create a widely consumable product. While these goals were largely met and several offerings became very successful, Grid computing never really became the firmly rooted utility-like service that we hoped for. Rather, it seems to have become an offering for specialized high-performance computing users.

This market is not the realm of service that I started thinking about early in this post. Take television service: this level of service is neither for a single viewer nor a small-business who might want to repackage a set of programs to its customers (say a sports bar). Rather it is for large-scale industries whose service requirements are unimaginable by all but a few people. I cannot even draw a parallel to television service. In telecommunication it would be the realm of a CLEC.

Furthermore, unlike my microwave, I am expected to customize my application to work well on a grid. I cannot simply plug it in and get better service than I can from my own PC. It would be the equivalent of choosing to reheat my food on my stove or building my own microwave. You see, my microwave, television service, and phone services are not just basic offerings of food preparation, entertainment, and communication. Instead, these are sophisticated systems that make my work and life easier. Grid computing, while very useful, does not simplify program implementation.

So in steps cloud computing: an emerging technology that seems to have significant overlap with grid computing while also providing simplifying services (something as a service). I may still have to assemble a microwave from pre-built pieces but everything is ready for me to use. I only have to add my personal touches to assemble a meal. It really isn't relevant whether the microwave is central to the task or just one piece of many.

When I approach a task that I hope to solve using a program, how might I plug that in just as easily? Let's quickly consider how services are provided for television. When I plug my application(TV) in to the electricity provider as well as a broadcaster of some sort, it just works. I can change the channel to the streams that I like. I can buy packages that provide me the best set of streams. In addition, some providers will offer me on-demand programming as well as internet and telephone services. If anything breaks, I call a number and they deal with it. None of this requires anything of me. I pay my bill and I get services.

Okay, how would that work for a computation? Say I want to find the inverse for a matrix. I would send out my data to the channel that inverted matrices the way I like them. The provider will worry about attaining the advertised performance, reliability, scalability, security, sustainability, device/location independence, tenancy, and capital expenditure: those characteristics of the cloud that I could not care less about. Additionally, the cloud properties that Rich Wellner assembled don't interest me much either. Certainly they may be differentiators, but the actual implementation is somebody else's problem in the same way that continuous electrical service provision is not my chief concern when I turn on the TV. What I want and will get is an inverse to the matrix I submitted in the time frame I requested deposited where I requested it to be put. I may use the inverted matrix to simultaneously solve for earthquake locations and earth properties or for material stresses and strains in a two-dimensional plate. That is my recipe and my problem.

After all, I should get services "without knowledge of, expertise with, or control over the technology infrastructure that supports them," as the cloud computing wiki page claims. Essentially the aforementioned cloud characteristics are directed towards service providers rather than to the non-expert consumer that highlights the wiki definition. Isn't the differentiator between the Cloud and the Grid the concealment of the complex infrastructure underneath? If the non-expert consumer is expected to worry about algorithm scalability, distributing data, starting and stopping resources and all of that, they certainly will need to gain some expertise quickly. Further, once they have that skill, why wouldn't they just use a mature Grid offering rather than deal with the non-standardized and chaotic clouds? Are these provider-specific characteristics not just a total rebranding of Grid?

As such, I suggest that several consumer-based characteristics should replace the rather inconsequential provider-internal ones that currently exist.

A cloud is characterized by services that:

use a specified algorithm to solve a particular problem;
can be purchased for one-time, infrequent use, or regular use;
state their peak, expected, and minimum performances;
state the expected response time;
can be queried for changes to expected response time;
support asynchronous messaging. A consumer must be able to discover when things are finished;
use standard, open, general-purpose protocols and interfaces (clearly);
have specified entry-points;
can interact with other cloud service providers. In particular, a service should be able to send output to long-term cloud-storage providers;

Now that sounds more like Computation-as-a-Service.

Monday, November 3, 2008

Cloud Computing: Commodity or Value Sale?

By Rich Wellner

There is a controversy in the cloud community today about whether the market is going to be one based on value or price. Rephrased, will cloud computing be a commodity or an enablement technology.

A poster on one of the cloud computing lists asserted that electricity would be a key component of pricing. He was then jumped on by people saying that value would be the key.

It seems like folks are talking past one another.

His assertion is true if CC is a commodity.

Now that said, there are precious few commodities in IT. Maybe internet connectivity is one. Monitors might be another. Maybe there are a few more.

But very quickly you get past swappable components that do very nearly the same job and into the realm of 'stuff' that is not easily replaceable. Then the discussion turns to one of value.

Amazon recognized the commodity of books and won the war over people who were trying to sell value. They appear to be attempting to do the same with computer time, which makes the battle they will fight over the next few years with Microsoft (and the increasing number of smaller players) extra interesting.

There is also the problem of making sweeping statements like "the market will figure things out". There is no "the market". Even on Wall Street. The reason things happen is because different people and institutions have different investment goals. Those goals vary over time and create growing or shrinking windows of opportunity for other people and institutions.

I've made my bet on how "the market" for cloud computing will shake out in the short to medium term. Now I'm just hoping that there are enough of the people and institutions my bet is predicated on in existence.

Wednesday, October 29, 2008

Elastic Management of Computing Clusters

By Ignacio Martin Llorente

Besides all the hype, clouds (i.e. a service for the on-demand
provision of virtual machines, others would say IaaS) are making
utility computing a reality, check for example the the Amazon EC2 case studies .
This new model, and virtualization technologies in general, is also
being actively explored by the scientific community. There are quite a
few initiatives that integrates virtualization with a range of
computing platforms, from clusters to Grid infrastructures.
Once this integration is achieved the next step is natural, jump to the
clouds and provision the VMs from an external site. For example, a
recent work from UNIVA UD has demonstrated the feasibility of supplementing a UNIVA Express cluster with EC2 resources (you can download the whitepaper to learn more).

This cloud provision model can be further integrated with the
in-house physical infrastructure when it is combined with a virtual
machine (VM) management system, like OpenNebula.
A VM manager is responsible for the efficient management of the virtual
infrastructure as a whole, by providing basic functionality for the
deployment, control and monitoring of VMs on a distributed pool of
resources. The use of this new virtualization layer decouples the
computing cluster from the physical infrastructure, and so extends the
classical benefits of VMs to the cluster level (i.e. cluster
consolidation, cluster isolation, cluster partitioning and elastic
cluster capacity).

Architecture of an Elastic Cluster

A computing cluster can be easily virtualized by putting the front-end
and worker nodes into VMs. In our case, the virtual cluster front-end
(SGE master host) is deployed in the local resources with Internet
connectivity to be able to communicate with Amazon EC2 VMs. This
cluster front-end acts also as NFS and NIS server for every worker node
in the virtual cluster.

The virtual worker nodes communicate with the front-end through a private local area network. The local worker nodes are connected to this vLAN through a virtual bridge configured in every physical host. The EC2 worker nodes
are connected to the vLAN with an OpenVPN tunnel, which is established
between each remote node (OpenVPN clients) and the cluster front-end
(OpenVPN server). With this configuration, every worker node (either
local or remote) can communicate with the front-end and can use the
common network services transparently. The architecture of the cluster
is shown in the following figure:

Figure courtesy of Prof. Rafael Moreno

Deploying a SGE cluster with OpenNebula and Amazon EC2

The last release of OpenNebula includes a driver to deploy VMs in the
EC2 cloud, and so it integrates the Amazon infrastructure with your
local resources. The EC2 is managed by OpenNebula just as another local
resource with a configurable pre-fixed size,
to limit the cluster capacity (i.e. SGE workernodes) that can be
allocated in the cloud. In this set-up, your local resources would look
like as follows:

>onehost list
HID NAME     RVM      TCPU   FCPU   ACPU    TMEM    FMEM STAT
   0 ursa01     0       800    798    800 8387584 7663616  off
   1 ursa02     0       800    798    800 8387584 7663616  off
   2 ursa03     0       800    798    800 8387584 7663616  on
   3 ursa04     2       800    798    600 8387584 6290432  on
   4 ursa05     1       800    799    700 8387584 7339008  on
   5 ec2        0       500    500    500 8912896 8912896  on

The last line corresponds to EC2, currently configured to host up to 5 m1.small instances.

The OpenNebula EC2 driver translates a general VM deployment file in
an EC2 instance description. The driver assumes that a suitable Amazon
machine image (AMI) has been previously packed and registered in the S3
storage service. So when a given VM is to be deployed in EC2 its AMI
counterpart is instantiated. A typical SGE worker node VM template
would be like this:

NAME   = sge_workernode
CPU    = 1
MEMORY = 128                                                            

#Xen or KVM template machine, used when deploying in the local resources
OS   = [kernel="/vmlinuz",initrd= "/initrd.img",root="sda1" ]
DISK = [source="/imges/sge/workernode.img",target="sda",readonly="no"]
DISK = [source="/imges/sge/workernode.swap",target="sdb",readonly="no"]
NIC  = [bridge="eth0"]

#EC2 template machine, this will be use wen submitting this VM to EC2
EC2 = [ AMI="ami-d5c226bc",
        KEYPAIR="gsg-keypair",
        AUTHORIZED_PORTS="22",
        INSTANCETYPE=m1.small]

Once deployed, the cluster would look like this (sge master, 2 local worker nodes and 2 ec2 worker nodes:

>onevm list
  ID      NAME STAT CPU     MEM        HOSTNAME        TIME
  27  sgemast runn 100 1232896          ursa05 00 00:41:57
  28  sgework runn 100 1232896          ursa04 00 00:31:45
  29  sgework runn 100 1232896          ursa04 00 00:32:33
  30  sgework runn   0       0             ec2 00 00:23:12
  31  sgework runn   0       0             ec2 00 00:21:02

You can get additional info from your ec2 VMs, like the IP, using the onvm show command

So, it is easy to manage your virtual cluster with OpenNebula and
EC2, but what about efficiency?. Besides the inherent overhead induced
by virtualization (around a 10% for processing), the average deployment
time of a remote EC2 worker node is 23.6s while a local one takes only
3.3s. Moreover, when executing a HTC workload, the overhead induced by
using EC2 (vpn, and a slower network connection) can be neglected.

Ruben S. Montero

This is a joint work with Rafael Moreno and Ignacio M. Llorente

Reprinted from blog.dsa-research.org

Monday, October 20, 2008

Auditing the Cloud

By Rich Wellner

I've written here about the importance of SLAs for useful cloud computing platforms on a few occasions in the past. The idea behind clouds, that you can get access to resources on demand, is an appealing one. However, it is only part of the total picture. Without an ability to state what you want and go to bed, there isn't much value in the cloud.

Think about that for a minute. With the cloud computing offerings currently available there are no meaningful SLAs written down anywhere. Yet people, every day, run their production applications on an implicit SLA that is internalized something like "amazon is going to give me N units of work for M price".

There are two problems with this.

Amazon doesn't scale your resources. Your demand may have spiked and you are still running on the resource you signed up for.
There is no audit capability on EC2.

In the Cloud Computing Bill of Rights we wrote about three important attributes that need to be available to do an audit.

Events -- The state changes and other factors that effected your system availability.
Logs -- Comprehensive information about your application and its runtime environment.
Monitoring -- Should not be intrusive and must be limited to what the cloud provider reasonably needs in order to run their facility.

The idea here is that rather than just accepting what your cloud provider sends you at the end of the month as a bill, the world of cloud computing is complex enough that a reasonable set of runtime information must be made available to substantiate the providers claim for compensation.

This is particularly true in the world of SLAs. If my infrastructure is regularly scaling up, out, down or in to meet demands it is essential to be able to verify that the infrastructure is reacting the way that was contracted. Without that, it will be very hard to get people to trust the cloud.

Monday, October 13, 2008

Cloud and Grid are Complementary Technologies

By Ignacio Martin Llorente

There is a growing number of posts and articles trying to show how
cloud computing is a new paradigm that supersedes Grid computing by
extending its functionality and simplifying its exploitation, even
announcing that Grid computing is dead.
It seems that new technologies and paradigms have always the mission
objective to substitute existing ones. Some of these contributions do
not fully understand what grid computing is, focusing their comparative
analysis on simplicity of interfaces, implementation details or basic computing aspects. Others posts define Cloud in the same terms as Grid or create a taxonomy which includes Grid and cluster computing technologies.

Grid is as an interoperability technology, enabling
the integration and management of services and resources in a
distributed, heterogeneous environment. The technology provides support
for the deployment of different kinds of infrastructures joining
resources which belong to different administrative domains. In the
special case of a Compute Grid infrastructure, such as EGEE or TeraGrid,
Grid technology is used to federate computing resources spanning
multiple sites for job execution and data processing. There are many
success cases demonstrating that Grid technology provides the support
required to fulfill the demands of several collaborative scientific and
business processes.

On the other hand, I do not think there is a single definition for cloud computing as it denotes multiples meanings for different communities (SaaS, PaaS, IaaS...). From my view, the only new feature offered by cloud systems is the provision of virtualized resources as a service, being virtualization the enabling technology. In other words, the relevant contribution of cloud computing is the Infrastructure as a Service (IaaS) model.
Virtualization rather than other non significant issues, such as the
interfaces, is the key advance. At this point, I should remark that virtualization has been used by the Grid community before the arrival of the "Cloud".

Once I have clearly stated my position about Cloud and Grid, let me
show how I see Cloud (and virtualization as enabling technology) and
Grid as complementary technologies that will coexist and cooperate at
different levels of abstraction in future infrastructures.

There will be a Grid on top of the Cloud

Before explaining the role of cloud computing as resource provider
for Grid sites, we should understand the benefits of the virtualization
of the local infrastructure (Enterprise or Local Cloud?). How can I access on demand to a cloud provider if I have not previously virtualized my local infrastructure?.

Existing virtualization technologies allow a full separation of resource provisioning from service management.
A new virtualization layer between the service and the infrastructure
layers decouples a server not only from the underlying physical
resource but also from its physical location, without requiring any modification within service layers from both the service administrator and the end-user perspectives. Such decoupling is the key to support
the scale-out of a infrastructure in order to supplement local
resources with cloud resources to satisfy peak or fluctuating demands.

Getting back to the Grid computing case, the virtualization of a Grid site provides several benefits, which overcome many of the technical barriers for Grid adoption:

Easy support for VO-specific worker nodes
Reduce gridification cycles
Dynamic balance of resources between VO’s
Fault tolerance of key infrastructure components
Easier deployment and testing of new middleware distributions
Distribution of pre-configured components
Cheaper development nodes
Simplified training machines deployment
Performance partitioning between local and grid services
On-demand access to cloud providers

If you are interested in more details about how virtualization
and cloud computing can support compute Grid infrastructures you can
have a look at my presentation "An Introduction to Virtualization and Cloud Technologies to Support Grid Computing" (EGEE08). I also recommend the report "An EGEE Comparative study: Clouds and grids - evolution or revolution?".

There exist technology which supports the above use case. The OpenNebula engine
enables the dynamic deployment and re-allocation of virtual machines on
a pool of physical resources, providing support to access on-demand to Amazon EC2 resources. On the other hand, Globus Nimbus
provides a free, open source infrastructure for remote deployment and
management of virtual machines, allowing you to create compute clouds.

There will be a Grid under the Cloud

There is a growing interest in the federation of cloud sites. Cloud providers are opening new infrastructure centers at different geographical locations (see IBM or Amazon Availability Zones)
and it is clear that no single facility/provider can create a seemingly
infinite infrastructure capable of serving massive amounts of users at
all times, from all locations. David Wheeler once said, "Any problem in computer science can be solved with another layer of indirection… But that usually will create another problem“,
in the same line, federation of cloud sites involves many technological
and research challenges, but the good news is that some of them are not
new, and have been already studied and solved by the Grid community.

As stated above Grid is not only about computing. Grid is a technology for federation.
In the last years, there has been a huge investment in research and
development of technological components for sharing of resources across
sites. Several middleware components for file transferring, SLA
negotiation, QoS, accounting, monitoring... are available, most of them
are open-source. As also predicted by Ian Foster in his post "There's Grid in them thar Clouds",
those will be the components that could enable the federation of cloud
sites. On the other hand, other components have to be defined and
developed from scratch, mainly those related to the efficient
management of virtual machines and services within and across
administrative domains. That is exactly the aim of the Reservoir project, the European initiative in Cloud Computing.

Conclusions

In order to conclude this post let me venture some predictions about the coexistence of Grid and Cloud computing in future infrastructures:

Virtualization, cloud, grid and cluster are complementary
technologies that will coexist and cooperate at different levels of
abstraction
Although there are early adopters of virtualization in the
Grid/cluster/HPC community, its full potential has not been exploited
yet
In few years, the separation of job management from resource
management through a virtualized infrastructure will be a common
practice
Emerging open-source VM managers, such as OpenNebula, will contribute to speed up the adoption
Grid/cluster/HPC infrastructures will maintain a resource base
scaled to meet the average workload demand and will transparently
access to cloud providers to meet peak demands
Grid technology will be used for the federation of clouds

In summary, let's try to forget about hypes and concentrate on the
complementary functionality provided by both paradigms. My message to
the user community, the relevant issue is to evaluate which technology
meets your requirements. It is unlikely that a single technology will meet all
needs. My message to the Grid community, please do not see Cloud as a
threat. Virtualization and Cloud are needed to solve many of the
technical barriers for wider Grid adoption. My message to the Cloud
community, please try to take advantage of the research and development
performed by the Grid community in the last decade.

Ignacio Martín Llorente

Reprinted from blog.dsa-research.org

Wednesday, September 17, 2008

The OpenNebula Engine for Data Center Virtualization and Cloud Solutions

By Ignacio Martin Llorente

Virtualization has opened up avenues for new resource management
techniques within the data center. Probably, the most important
characteristic is its ability to dynamically shape a given hardware
infrastructure to support different services with varying workloads.
Therefore, effectively decoupling the management of the service (for
example a web server or a computing cluster) from the management of the
infrastructure (e.g. the resources allocated to each service or the
interconnection network).

A
key component in this scenario is the virtual machine manager. A VM
manager is responsible for the efficient management of the virtual
infrastructure as a whole, by providing basic functionality for the
deployment, control and monitoring of VMs on a distributed pool of
resources. Usually, these VM managers also offer high availability
capabilities and scheduling policies for VM placement and physical
resource selection. Taking advantage of the underlying virtualization
technologies and according to a set of predefined policies, the VM
manager is able to adapt the physical infrastructure to the services it
supports and their current load. This adaptation usually involves the
deployment of new VMs or the migration of running VMs to optimize their
placement.

The dsa-research group
at the Universidad Complutense de Madrid has released under the terms
of the Apache License, Version 2.0, the first stable version of the OpenNebula Virtual Infrastructure Engine.
OpenNebula enables the dynamic allocation of virtual machines on a pool
of physical resources, so extending the benefits of existing
virtualization platforms from a single physical resource to a pool of
resources, decoupling the server not only from the physical
infrastructure but also from the physical location. OpenNebula is a
component being enhanced within the context of the RESERVOIR European Project.

The new VM manger differentiates from existing VM managers in its
highly modular and open architecture designed to meet the requirements
of cluster administrators. OpenNebula 1.0 supports Xen and KVM
virtualization platforms to provide several features and capabilities
for VM dynamic management, such as centralized management, efficient
resource management, powerful API and CLI interfaces for monitoring and
controlling VMs and physical resources, fault tolerant design... Two of
the outstanding new features are its support for advance reservation
leases and on-demand access to remote cloud provider

Support for Advance Reservation Leases

Haizea
is an open source lease management architecture that OpenNebula can use
as a scheduling backend. Haizea uses leases as a fundamental resource
provisioning abstraction, and implements those leases as virtual
machines, taking into account the overhead of using virtual machines
(e.g., deploying a disk image for a VM) when scheduling leases. Using
OpenNebula with Haizea allows resource providers to lease their
resources, using potentially complex lease terms, instead of only
allowing users to request VMs that must start immediately.

Support to Access on-Demand to Amazon EC2 resources

Recently, virtualization has also brought about a new utility
computing model, called cloud computing, for the on-demand provision of
virtualized resources as a service. The Amazon Elastic Compute Cloudi
s probably the best example of this new paradigm for the elastic
capacity providing. Thanks to virtualization, the clouds can be used
efficiently to supplement local capacity with outsourced resources. The
joint use of these two technologies, VM managers and clouds, will
change arguably the structure and economics of current data centers.
OpenNebula provides support to access Amazon EC2 resources to
supplement local resources with cloud resources to satisfy peak or
fluctuating demands.

Scale-out of Computing Clusters with OpenNebula and Amazon EC2

As use case to illustrate the new capabilities provided by OpenNebula, the release includes documentation
about the application of this new paradigm (i.e. the combination of VM
managers and cloud computing) to a computing cluster, a typical data
center service. The use of a new virtualization layer between the
computing cluster and the physical infrastructure extends the classical
benefits of VMs to the computing cluster, so providing cluster
consolidation, cluster partitioning and support for heterogeneous
workloads. Moreover, the integration of the cloud in this layer allows
the cluster to grow on-demand with additional computational resources
to satisfy peak demands.

I gnacio Martín Llorente

Reprinted from blog.dsa-research.org

Tuesday, September 16, 2008

Cloud Caucusing

By Rich Wellner

Several months ago on this blog, I mused on what was meant by the term cloud computing. At the time, it was even more difficult than it is today to get a solid definition of the concept. Since then, many opinions have been bandied about providing plenty of fuel for the debate. While I think the concept has solidified some, cloud computing remains a highly polysemous term where folks from different backgrounds have developed their own definitions based upon their particular worldviews. These viewpoints come from vendors, specialists, researchers, as well as different user communities.

Although a unified definition for cloud computing has not emerged, the concept has gained a lot of traction. I believe that this is because each interested-group has found significant promise in what they call the cloud. Of course anything with this much possibility will certainly see some hype. As I have said, before: the term invokes thoughts of transient beauty and power: even marketing folks can get excited with this one! (Compare that to SaaS).

In any event, I thought that I would give you a quick idea of the types of discussions going on around cloud computing on the internet:

Twenty Experts Define Cloud Computing;
The Next Perfect IT Storm;
Google Groups “Discussion on the-definition-of-a-cloud-of-computers”;
Cloud Computing Promise & Reality (from which we learned, “There is a clear consensus that there is no real consensus on what cloud computing is.” Bob Buderi, founder and CEO of Xconomy);
Cloud Computing Hype versus Reality;
Wiki Definition.

Compare these to one of the earliest usages of the term (search for cloud). Clearly, these documents are far from a representative set of the discussions going on out there. It just so happened that I selected a few from those I have read lately. There really is a lot going on out there.

Ultimately I expect to see many types of formalized clouds, each depending on their
operating environments and behaviors — just like I see when I look outside my
window. Once that happens, the big debates about how to interoperate between clouds of very different nature will begin. Transforming a concept into a widely accepted framework is never easy. After all, why should I have to bend my perfect cloud so that it works with yours?

So what is the upside of all this banter? It turns out that the less often a word is used, the faster it evolves. Ironically, the hype may actually force this community into consensus. As long as we keep this dialog going, we should expect a formalized cloud to come about in no time!!!

Thursday, September 4, 2008

A Cloud by Any Other Name

By Rich Wellner

The cloud list on google has been buzzing lately about the term "Enterprise Cloud" and whether it had any significance.

I had to chuckle as history started to repeat itself again between the early days of the grid and the early days of the cloud.

In our book Pawel and I wrote a section titled "How the Market Understands Grids". We didn't try to dictate terms, we tried to document the language in place at that moment in time.

In interviewing users we gathered the following terms:

Clusters -- Computers standing together, but accessible only to a small group of people
Departmental grids -- Multiple clusters accessible on a common backplane, but owned by one department
Enterprise grids -- Corporate resources available to all in the company (known today as a Enterprise Cloud)
Partner grids -- A few companies working together on big problems and sharing resources to accomplish their goals.
Open grids -- Many organizations making resources available to other members of that grid. A key distinction between an open grid and a partner grid is that an open grid doesn't typically have a key application or goal while a partner grid does.

We blanched a bit because to us grid computing meant only the last definition and we viewed those other ones as missing some key attributes that those of us who had been working in the grid field since its inception thought were really important.

We see the same thing happening today with the term cloud and particularly in the term Enterprise Cloud.

That said, is Enterprise Cloud really an oxymoron, as one person suggested?

First we have to get to definitions:

Here are the key characteristics from the cloud computing wiki:

Capital expenditure minimized and thus low barrier to entry as infrastructure is owned by the provider and does not need to be purchased for one-time or infrequent intensive computing tasks. Services are typically being available to or specifically targeting retail consumers and small businesses.
Device and location independence which enables users to access systems regardless of location or what device they are using (eg PC, mobile).
Multitenancy enabling sharing of resources (and costs) among a large pool of users, allowing for:
- Centralization of infrastructure in areas with lower costs (eg real estate, electricity)
- Peak-load capacity increases (users need not engineer for highest possible load levels)
- Utilization and efficiency improvements for systems that are often only 10-20% utilised.
Performance is monitored and consistent but can be affected by insufficient bandwidth or high network load.
Reliability by way of multiple redundant sites, which makes it suitable for business continuity and disaster recovery, however IT and business managers are able to do little when an outage hits them.
Scalability which meets changing user demands quickly, without having to engineer for peak loads. Massive scalability and large user bases are common but not an absolute requirement.
Security which typically improves due to centralization of data, increased security-focused resources, etc. but which raises concerns about loss of control over certain sensitive data. Accesses are typically logged but accessing the audit logs themselves can be difficult or impossible.
Sustainability through improved resource utilisation, more efficient systems and carbon neutrality.

None of those seem to exclude the term Enterprise Cloud.

Here's the list of attributes I compiled from the cloud google group and others IRL:

Multiple vendors accessible through open standards and not centrally
administered
Non-trivial QOS (see the gmail debate thread)
On demand provisioning
Virtualization
The ability for one company to use anothers resources (e.g. bobco
using ec2)
Discoverability across multiple administrative domains (e.g.
brokering to multiple cloud vendors)
Data storage
Per usage billing
Resource metering and basic analytics
Access to the data could me bandwidth/latency limitations, security,
Compliance – Architecture/implementation, Audit, verification
Policy based access – to data, applications and visibility
Security not only for data but also for applications

Now here we start to see some things that aren't applicable to enterprise clouds (i.e. 1, 5, 6). But the bulk of the list still works. And it's worth noting that EC2 fails on four of those things (i.e. 1, 11, 12, 13), but people don't hesitate to allow them the use of the term cloud.

In previous technology revolutions I learned the lesson (slowly) to not care so much what things are called as much as what they do (which was why, in my early writings on this group I was trying to point out to people (mostly unsuccessfully) that there are lessons to be learned from grid computing). But claiming there is a canonical definition of cloud and that enterprise cloud is a nonsense term doesn't seem accurate on the face of things. Enterprise Cloud does, however capture the essence of what many large corporate IT groups are doing/considering. Rather than telling them they shouldn't be calling it cloud/grid/enterprise cloud/managed services/SaaS/whatever, I'm taking the approach of helping them meet their business needs, with technology wearing a variety of banners, and letting them call it whatever they like.

Monday, July 21, 2008

I have a Theory

By Roderick Flores

It was with great curiosity that I read Chris Anderson's article on the end of theory. To summarize his position, the "hypothesize, model, and test" approach to science has become obsolete now that there are petabytes of information and countless numbers of computers capable of processing that data. Further, this data-tsunami has made the search for models of real-world phenomena pointless because, "correlation is enough."

The first thing that struck me as ironic about this argument is that statistical correlation is itself a model including all of its associated simplified and assumptive baggage. Just how do I assign a measure of similarity between a set of objects without having a mathematical representation (i.e. a model) of those things? How might I handle strong negative-correlation in this analysis? What about the null hypothesis? While not interesting, per se, it is useful information. Will a particular measurement be allowed to correlate with more than a single result-cluster?

Additionally, we must decide how to relate these petabytes of measurements into correlated-clusters. As before, the statistics that are used to calculate correlation are also models. Are we considering Gaussian distributions, scale-invariant power-laws, or perhaps a state-driven sense of probability? Are we talking about events that have a given likelihood such as the toss of a coin or, more likely, subjective plausibility? You need to be very cautious when choosing your statistical model. For example, using a bell-curve to describe unbounded-data destroys any real sense of correlation.

Regardless of how you statistically model your measurements, you must understand your data lest your correlations may not make sense. For example, imagine that I have two acoustic time-series. How do I measure the correlation of these two recordings to determine how well the are related? The standard approach is to simply convolve the two signals and look for a value that indicates “significant correlation”, whatever your model for that turns out to be. Yet this doesn't mean much unless I understand my data. Were each of these time-series recorded at the same sampling rate? For example, if I have 20 samples of a 10Hz sine-wave recorded at 100 samples per second it will appear exactly the same as 20 samples of a 5Hz sine-wave recorded at 50 samples per second. If I naively plot the samples, they will correlate perfectly. Basically, if I don't understand my data, I can easily erroneously report that the correlation of the two signals is perfect when in fact they have zero correlation.

Finally, what I find most intriguing is the presumption that the successful correlation of petabytes of data culled web-pages and the associated viewing habits data somehow generalizes into a method for science in general. Unlike the “as-seen on TV” products I see in infomercials, statistical inference is not the only tool that I will ever need. Restricting ourselves to correlation removes one of the most powerful tools we have: prediction. Without it, scientific discovery would be hobbled.

Consider, the correlation of all of the observed information regarding plate-boundary movement (through some model of the earth) along a fault such as the San Andreas. Keep in mind that enormous amounts of data are collected in this region. Anyway, quiet areas along the fault would either imply that a particular piece of the fault were no longer seismically-active or, using anti-correlation, that the “slip deficit” suggested that a much larger earthquake was more likely to occur in the future for that zone (These areas are referred to as seismic gaps). Moreover, the Parkfield segment of the San Andreas fault has large earthquakes approximately every twenty years. A correlative model would suggest that the entire plate-boundary should be similar which is simply not true as proven by the Anza Seismic Gap. Furthermore, correlation would also have implied that another large event should have occurred along the Parkfield Gap in the late 80s. If science were only concerned with correlation, one instrument in this zone would have been sufficient. However, the diverse set of predictions made by researchers demanded a wide variety of experiments. Consequently, this zone became the most heavily instrumented area in the world in an effort to extensively study the expected large event. They had to wait for over fifteen years for this to happen. Then there are events that few would have predicted (Black Swans) such as “slow” earthquakes which require special instrumentation to capture. These phenomena, until recently, were not able to be correlated with anything and thus, never would have existed. In fact, one of the first observations of these events was attributed to instrument error.

Clearly correlation is but one approach to modeling processes amongst many. I have a theory that we in the grid community can expect to help scientists solve many different types of theoretical problems for a good long time. Now to test...

Monday, June 30, 2008

About Grid Engine Advance Reservations

By Sinisa Veseli

Advance reservation (AR) capability is one of the most important new features of the upcoming Grid Engine 6.2 release. New command line utilities allow users and administrators to submit resource reservations (qrsub), view granted reservations (qrstat), or delete reservations (qrdel). Also, some of the existing commands are getting new switches. For example, the “-ar <AR id>“ option for qsub indicates that the submitted job is a part of an existing advanced reservation. Given that AR is a new functionality, I thought that it might be useful to describe how it works on a simple example (using 6.2 Beta software).

Advanced resource reservations can be submitted to Grid Engine by queue operators and managers, and also by a designated set of privileged users. Those users are defined in ACL “arusers”, which by default looks as follows:

$ qconf -sul
arusers
deadlineusers
defaultdepartment

$ qconf -su arusers
name    arusers
type    ACL
fshare  0
oticket 0
entries NONE

The “arusers” ACL can be modified via the “qconf -mu” command:

$ qconf -mu arusers
veseli@tolkien.ps.uud.com modified "arusers" in userset list

$ qconf -su arusers
name    arusers
type    ACL
fshare  0
oticket 0
entries veseli

Once designated as a member of this list, the user is allowed to submit ARs to Grid Engine:

[veseli@tolkien]$ qrsub -e 0805141450.33 -pe mpi 2
Your advance reservation 3 has been granted

[veseli@tolkien]$ qrstat
ar-id   name       owner        state start at             end at               duration
-----------------------------------------------------------------------------------------
      3            veseli       r     05/14/2008 14:33:08  05/14/2008 14:50:33  00:17:25

[veseli@tolkien]$ qstat -f 
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/0/4          0.04     lx24-x86

For the sake of simplicity, in the above example we have a single queue (all.q) that has 4 job slots and a parallel environment (PE) mpi assigned to it. After reserving 2 slots for the mpi PE, there are only 2 slots left for running regular jobs until the above shown AR expires. Note that the "–e" switch for qrsub designates requested reservation end time in the format YYMMDDhhmm.ss. It is also worth pointing out that the qstat output changed slightly with respect to previous software releases in order to accommodate display of existing reservations.

If we now submit several regular jobs, only 2 of them will be able to run:

[veseli@tolkien]$ qsub regular_job.sh 
Your job 15 ("regular_job.sh") has been submitted
...
[veseli@tolkien]$ qsub regular_job.sh 
Your job 19 ("regular_job.sh") has been submitted

[veseli@tolkien]$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/2/4          0.03     lx24-x86      
     15 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     16 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     17 0.55500 regular_jo veseli       qw    05/14/2008 14:34:22     1        
     18 0.55500 regular_jo veseli       qw    05/14/2008 14:34:23     1        
     19 0.55500 regular_jo veseli       qw    05/14/2008 14:34:24     1

However, if we submit jobs that are part of the existing AR, those are allowed to run, while jobs submitted earlier are still pending:

[veseli@tolkien]$ qsub -ar 3 reserved_job.sh 
Your job 20 ("reserved_job.sh") has been submitted
[veseli@tolkien]$ qsub -ar 3 reserved_job.sh 
Your job 21 ("reserved_job.sh") has been submitted

[veseli@tolkien]$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@tolkien.ps.uud.com       BIP   2/4/4          0.02     lx24-x86      
     15 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     16 0.55500 regular_jo veseli       r     05/14/2008 14:34:32     1        
     20 0.55500 reserved_j veseli       r     05/14/2008 14:35:02     1        
     21 0.55500 reserved_j veseli       r     05/14/2008 14:35:02     1

############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     17 0.55500 regular_jo veseli       qw    05/14/2008 14:34:22     1        
     18 0.55500 regular_jo veseli       qw    05/14/2008 14:34:23     1        
     19 0.55500 regular_jo veseli       qw    05/14/2008 14:34:24     1

The above example illustrates how ARs work. As long as particular reservation is valid, only jobs that are designated as part of it can utilize resources that have been reserved.

I think that AR will prove to be extremely valuable tool for planning grid resource usage, and I’m very pleased to see it in the new Grid Engine release.

Friday, June 6, 2008

Steaming Java

By Roderick Flores

When Rich asked us to walk through a software development process, I immediately
thought back to a conversation that I had with my friend Leif Wickland about building high-performance Java applications. So I immediately emailed him asking him for his best practices. We have both produced code that is as fast, if not faster than C compiled with optimization (for me it was using a 64-bit JRE on a x86_64 architecture with multiple cores).

That is not to say that if you were to spend time optimizing the equivalent C-code that it would not be made to go faster. Rather, the main point is that Java is a viable HPC language. On a related note, Brian Goetz of Sun has a
very interesting discussion on IBM's DeveloperWorks, Urban performance legends, revisited on how garbage collection allows faster raw allocation performance.

However I digress… Here is a summary of what we both came up with (in no
particular order):

It is vitally important to "measure, measure, measure," everything
you do. We can offer any set of helpful hints but the likelihood that all of them should be applied is extremely low.
It is equally important to remember to only optimize areas in the program that are bottlenecks. It is a waste of development time for no real gain.
One of the most simple and overlooked things that help your application is to overtly specify method parameters that are read-only using the final modifier. Not only can it help the compiler with optimization but it also
is a good way of communicating your intentions to your teammates. Furthermore, i
f you can make your method parameters final, this will help even more. One thing
to be aware of is that not all things that are declared final behave as expected (see Is that your final answer? for more detail).
If you have states shared between threads, make whatever you can final so that that the VM takes no steps to ensure consistency. This is not
something that we would have expected to make a difference, but it seems to help.
An equally ignored practice is using the finally clause. It i
s very important to clean up the code in a try block. You could leave open streams, SQL queries, or perhaps other objects lying around taking up space.
Create your data structures and declare your variables early. A core goal is to avoid allocating short-lived variables. While it is true that the garbage collector may reserve memory for variables that are declared often, why make it have to try to guess your intentions. For example, if a loop is called repeatedly, there is no need to say, for (int i = 0; …
when you should have declared i earlier. Of course you have to be careful
not to reset counters from inside of loops.
Use static for values that are constants. This may seem obvious, but not everybody does.
For loops embedded within other loops:
- Replace your outer loop with fixed-pool of threads.
  In the next release of java, this will be even easier using the fork-join keywords. This has become increasingly important with processors with many cores.
- Make sure that your innermost loop is the longest even if it doesn't necessarily map directly to the business goals. You shouldn't
  force the program to create a new loop too often as it wastes cycles.
- Unroll your inner-loops. This can save an enormous amount of time even if it isn't pretty. The quick test I just ran was 300% faster. If you haven'
  t unrolled a loop before, it is pretty simple:
  
  
        unrollRemainder = count%LOOP_UNROLL_COUNT;
  
  
  
        for( n = 0; n < unrollRemainder; n++ ) {
  
           // do some stuff here.
  
        }
  
  
  
        for( n = unrollRemainder; n < count; n+=LOOP_UNROLL_COUNT ) {
  
           // do stuff for n here
  
           // do stuff for n+1 here
  
           // do stuff for n+2 here
  
           …
  
           // do stuff for n+LOOP_UNROLL_COUNT - 1 here
  
        }
  
        Notice that both n and unrollRemainder were declared earlier as recommended previously.
Preload all of your input data and then operate on it later. There
is absolutely no reason that you should be loading data of any kind inside of your main calculation code. If the data doesn't fit or belong on one machine, use
a Map-Reduce approach to distribute it across the Grid.
Use the factory pattern to create objects.
- Data structures can be created ahead of time and only the necessary pieces are passed to the new object.
- Any preloaded data can also be segmented so that only the necessary parts are passed to the new object.
- You can avoid the allocation of short-lived variables by using constructors with the final keyword on its parameters.
- The factory can perform some heuristic calculations
  to see if a particular object should even be created for future processing.
When doing calculations on a large number of floating-point values,
use a byte array to store the data and a ByteWrapper to convert it to floats. This should primarily be used for read only (input) data. If you are writing floating-point values you should do this with caution as it may take
more time than using a float array. One major advantage that Java has when you use this approach is that you can switch between big and little-endian data rather easily.
Pass fewer parameters to methods. This results in less overhead. If
you can pass a static value it will pass one fewer parameter.
Use static methods if possible. For example, a FahrenheitToCelsius(float fahrenheit); method could easily be made static. The main advantage
here is that the compiler will likely inline the function.
There is some debate whether you should make particular methods
final if they are called often. There is a strong argument to not do this because the enhancement is small or nonexistent (see Urban Performance Legends or
once again Is that your final answer?). However my experience is that a small enhancement on a calculation that is run thousands of times can make a significant difference. Both Leif and I have seen measurable differences here. The key is to benchmark your code to be certain.

Wednesday, May 14, 2008

Grid Engine 6.2 Beta Release

By Sinisa Veseli

Grid Engine 6.2 will come with some interesting new features. In addition to advance resource reservations and array job interdependencies, this release will also contain a new Service Domain Manager (SDM) module, which will allow distributing computational resources between different services, such as different Grid Engine clusters or application servers. For example, SDM will be able to withdraw unneeded machines from one cluster (or application server) and assign it to a different one or keep it in its “spare resource pool”.

It is also worth mentioning that Grid Engine (and SDM) documentation is moving to Sun’s wiki.
The 6.2 beta release is available for download here.

Sunday, May 4, 2008

About Parallel Environments in Grid Engine

By Sinisa Veseli

Support for parallel jobs in distributed resource management software is probably one of those features that most people do not use, but those who do appreciate it a lot. Grid Engine supports parallel jobs via parallel environments (PE) that can be associated with cluster queues.

New parallel environment is created using the qconf -ap <environment name> command, and editing the configuration file that pops up. Here is an example of a PE slightly modified from the default configuration:

$ qconf -sp simple_pe
pe_name           simple_pe
slots             4
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $round_robin
control_slaves    FALSE
job_is_first_task FALSE
urgency_slots     min

In the above example, “slots” defines number of parallel tasks that can be run concurrently. The “user_lists” (“xuser_lists”) parameter should be a comma-separated list of user names that are allowed (denied) use of the given PE. If “user_lists” is set to NONE, any user that is not explicitly disallowed via the “xuser_lists” parameter.

The “start_proc_args” and “stop_proc_args” represent command line of startup and shutdown procedures for the parallel environment. These commands are usually scripts customized for a specific parallel library intended for a given PE. They get executed for each parallel job, and are used, for example, start any necessary daemons that enable parallel job execution. The standard output (error) of these commands are redirected into <job name>.po(pe).<job id> files in the job’s working directory, which is usually user’s home directory. It is worth noting that the customized PE startup and shutdown scripts can make use of several internal variables, such as $pe_hostfile and $job_id, that are relevant for the parallel job. The $pe_hostfile variable in particular points to a temporary file that contains list of machines and parallel slots allocated for the given job. For example, setting “start_proc_args” to “/bin/cp $pe_hostfile /tmp/machines.$job_id” would copy $pe_hostfile to the /tmp directory. Some of those internal variables are also available to job scripts as environment variables. In particular $PE_HOSTFILE and $JOB_ID environment variables will be set and will correspond to $pe_hostfile and $job_id, respectively.

The “allocation_rule” parameter helps scheduler decide how to distribute parallel processes among the available machines. It can take an integer that fixes the number of processes per host, or special rules like $pe_slots (all processes have to be allocated on a single host), $fill_up (start filling up slots on the best suitable host, and continue until all slots are allocated), and $round_robin (allocate slots one by one on each allocated host in a round robin fashion until all slots are filled).

The “control_slaves” parameter is slightly confusing. It indicates whether or not the Grid Engine execution daemon creates parallel tasks for a given application. In most cases (e.g., for MPI or PVM) this parameter should be set to FALSE, as custom Grid Engine PE interfaces are required for getting control of parallel tasks to work. Similarly, the “job_is_first_task” parameter is only relevant if control_slaves is set to TRUE. It indicates whether or not the original job script submitted execution is part of the parallel program.

The “urgency_slot” parameter is used for jobs that request range of parallel slots. If an integer value is specified, that number is used as prospective slot amount. If “min”, “max”, or “avg” is specified, the prospective slot amount will be determined as the minimum, maximum or average of the slot range, respectively.

After a parallel environment is configured and added to the system, it can be associated with any existing queue by setting the “pe_list” parameter in the queue configuration, and at this point users should be able to submit parallel job. On the GE project site one can find a number of nice How-To documents related to integrating various parallel libraries. If you do not have patience to build and configure one of those, but you would still like to see how stuff works, you can try adding a simple PE (like the one shown above) to one of your queues, and use a simple ssh-based master script to spawn and wait on the slave tasks:

#!/bin/sh
#$ -S /bin/sh
slaveCnt=0
while read host slots q procs; do
slotCnt=0
while [ $slotCnt -lt $slots ]; do
slotCnt=`expr $slotCnt + 1`
slaveCnt=`expr $slaveCnt + 1`
ssh $host "/bin/hostname; sleep 10" > /tmp/slave.$slaveCnt.out 2>&1  &
done
done < $PE_HOSTFILE
while [ $slaveCnt -gt 0 ]; do
wait
slaveCnt=`expr $slaveCnt - 1`
done
echo "All done!"

After saving this script as "master.sh" and submitting your job using something like "qsub -pe simple_pe 3 master.sh" (where 3 is the number of parallel slots requested), you should be able to see your "slave" tasks running on the allocated machines. Note, however, that you must have password-less ssh access to the designated parallel compute hosts in order for the above script to work.

Wednesday, April 30, 2008

The Role of Open Source in Grid Computing

Rich Wellner

Grid Guru Ian Foster has a great piece in International Science Grid This Week. He talks about the significance of choosing open source licenses in the history of Globus, leading to a field dominated by open source software.

Tuesday, April 29, 2008

The MapReduce Panacea Myth?

By Roderick Flores

Everywhere I go I read about how the MapReduce algorithm will and continues to change the world with its pure simplicity… Parallel programming is hard but MapReduce makes it easy... MapReduce: ridiculously easy distribute programming… Perhaps one day programming tools and languages will catch up with our processing capability but until then, MapReduce will allow us all to process very large datasets on massively parallel systems without having to bother with complicated interprocess communication using MPI.

I am a skeptic, which is not to say I have anything against a generalized framework for distributing data to a large number of processors. Nor does it imply that I enjoy MPI and its coherence arising from cacophonous chatter (if all goes well). I just don’t think MapReduce is particularly "simple". The key promoters of this algorithm such as Yahoo and Google have serious-experts MapReducing their particular problem sets and thus they make it look easy. You and your colleagues need to understand your data in some detail as well. I can think of a number of examples of why this is so.

First, let’s say that you are tasked with processing thousands of channels of continuously recorded broadband data from a VLBI based radio-telescope (or any other processing using beam-forming techniques for that matter). You cannot simply chop the data into nice time-based sections and send it off to be processed. Any signal processing that must be done to the data will produce terrible edge effects at each of the abrupt boundaries. Your file-splits must do something to avoid this behavior such as padding additional data on either side of the cut. This in turn will complicate the append phase after the processing is done. Thus you need to properly remove the padded data – if the samples do not align in a coherent way, then you will introduce a spike filled with energy into your result.

Alternatively, you might have been tasked with solving a large system of linear equations. For example say you are asked to produce a regional seismic tomography map with a resolution down to a few hundred meters using thousands of earthquakes each with tens of observations. You could easily produce a sparse system of equations that creates a matrix with something on the order of one million columns and several tens if not hundreds of thousands of rows. Distributed algorithms for solving such a system are well known but require our cranky friend MPI. However we can map this problem to several independent calculations as long as we are careful no to bias the input data as in the previous example. I will not bore you with the possibilities but suffice it to say that researchers have been producing tomographic maps for many years by carefully selecting the data and model calculated at any one time.

I know what many of you are thinking – I’ve read it before: MapReduce is meant for "non-scientific”"problems. But is a sophisticated search-engine any different? What makes it any less "scientific" than the examples I provided? Consider a search-engine that maintains several (n) different document indexes distributed throughout the cloud. A user then issues a query which is mapped to n servers. Let’s assume for the sake of time, each node returns its top m results to the reduce phase. These m results are then sorted and returned to the user. The assumption here is that there is no bias in the distribution of indexed documents relevant to a user’s query. Perhaps one or more documents beyond the first m found in one particular index are far more relevant than the other (n+1) * m results from the other indexes. But the user will never know. Should the search engine return every single result to the reduce phase at the expense of response time? Is there a way to distribute documents to the individual indexes to avoid well-known (but not all) biases? I suggest that these questions are the sorts of things that give one search-engine an edge over another. Approaches to these sorts of issues might well be publishable in referred journals. In other words, it sounds scientific to me.

I hope that by now you can see why I say that using MapReduce is only simple if you know how to work with (map) your data (especially if it is wonderfully-wacky). There is an inherent risk of bias in any map reduce algorithm. Sadly this implies that processing data in parallel is still hard no matter how good of a programmer you are nor how sophisticated your programming language is.