Wednesday, October 29, 2008

Elastic Management of Computing Clusters

By Ignacio Martin Llorente

Besides all the hype, clouds (i.e. a service for the on-demand
provision of virtual machines, others would say IaaS) are making
utility computing a reality, check for example the the Amazon EC2 case studies .
This new model, and virtualization technologies in general, is also
being actively explored by the scientific community. There are quite a
few initiatives that integrates virtualization with a range of
computing platforms, from clusters to Grid infrastructures.
Once this integration is achieved the next step is natural, jump to the
clouds and provision the VMs from an external site. For example, a
recent work from UNIVA UD has demonstrated the feasibility of supplementing a UNIVA Express cluster with EC2 resources (you can download the whitepaper to learn more).


OpenNebula virtual infrastructure engine components and its<br /> integration with Amazon EC2


This cloud provision model can be further integrated with the
in-house physical infrastructure when it is combined with a virtual
machine (VM) management system, like OpenNebula.
A VM manager is responsible for the efficient management of the virtual
infrastructure as a whole, by providing basic functionality for the
deployment, control and monitoring of VMs on a distributed pool of
resources. The use of this new virtualization layer decouples the
computing cluster from the physical infrastructure, and so extends the
classical benefits of VMs to the cluster level (i.e. cluster
consolidation, cluster isolation, cluster partitioning and elastic
cluster capacity).


Architecture of an Elastic Cluster

A computing cluster can be easily virtualized by putting the front-end
and worker nodes into VMs. In our case, the virtual cluster front-end
(SGE master host) is deployed in the local resources with Internet
connectivity to be able to communicate with Amazon EC2 VMs. This
cluster front-end acts also as NFS and NIS server for every worker node
in the virtual cluster.


The virtual worker nodes communicate with the front-end through a private local area network. The local worker nodes are connected to this vLAN through a virtual bridge configured in every physical host.  The EC2 worker nodes
are connected to the vLAN with an OpenVPN tunnel, which is established
between each remote node (OpenVPN clients) and the cluster front-end
(OpenVPN server). With this configuration, every worker node (either
local or remote) can communicate with the front-end and can use the
common network services transparently. The architecture of the cluster
is shown in the following figure:


Virtual Cluster Architecture

Figure courtesy of Prof. Rafael Moreno


Deploying a SGE cluster with OpenNebula and Amazon EC2

The last release of OpenNebula includes a driver to deploy VMs in the
EC2 cloud, and so it integrates the Amazon infrastructure with your
local resources. The EC2 is managed by OpenNebula just as another local
resource with a configurable pre-fixed size,
to limit the cluster capacity (i.e. SGE workernodes) that can be
allocated in the cloud. In this set-up, your local resources would look
like as follows:


>onehost list
HID NAME     RVM      TCPU   FCPU   ACPU    TMEM    FMEM STAT
   0 ursa01     0       800    798    800 8387584 7663616  off
   1 ursa02     0       800    798    800 8387584 7663616  off
   2 ursa03     0       800    798    800 8387584 7663616  on
   3 ursa04     2       800    798    600 8387584 6290432  on
   4 ursa05     1       800    799    700 8387584 7339008  on
   5 ec2        0       500    500    500 8912896 8912896  on

The last line corresponds to EC2, currently configured to host up to 5 m1.small instances.


The OpenNebula EC2 driver translates a general VM deployment file in
an EC2 instance description. The driver assumes that a suitable Amazon
machine image (AMI) has been previously packed and registered in the S3
storage service. So when a given VM is to be deployed in EC2 its AMI
counterpart is instantiated. A typical SGE worker node VM template
would be like this:


NAME   = sge_workernode
CPU    = 1
MEMORY = 128                                                            

#Xen or KVM template machine, used when deploying in the local resources
OS   = [kernel="/vmlinuz",initrd= "/initrd.img",root="sda1" ]
DISK = [source="/imges/sge/workernode.img",target="sda",readonly="no"]
DISK = [source="/imges/sge/workernode.swap",target="sdb",readonly="no"]
NIC  = [bridge="eth0"]

#EC2 template machine, this will be use wen submitting this VM to EC2
EC2 = [ AMI="ami-d5c226bc",
        KEYPAIR="gsg-keypair",
        AUTHORIZED_PORTS="22",
        INSTANCETYPE=m1.small]

Once deployed, the cluster would look like this (sge master, 2 local worker nodes and 2 ec2 worker nodes:


>onevm list
  ID      NAME STAT CPU     MEM        HOSTNAME        TIME
  27  sgemast runn 100 1232896          ursa05 00 00:41:57
  28  sgework runn 100 1232896          ursa04 00 00:31:45
  29  sgework runn 100 1232896          ursa04 00 00:32:33
  30  sgework runn   0       0             ec2 00 00:23:12
  31  sgework runn   0       0             ec2 00 00:21:02

You can get additional info from your ec2 VMs, like the IP, using the onvm show command


So, it is easy to manage your virtual cluster with OpenNebula and
EC2, but what about efficiency?. Besides the inherent overhead induced
by virtualization (around a 10% for processing), the average deployment
time of a remote EC2 worker node is 23.6s while a local one takes only
3.3s. Moreover, when executing a HTC workload, the overhead induced by
using EC2 (vpn, and a slower network connection) can be neglected.


Ruben S. Montero


This is a joint work with Rafael Moreno and Ignacio M. Llorente


Reprinted from blog.dsa-research.org 

No comments:

Post a Comment