Tuesday, November 27, 2007

Unleash the Monster: Distributed Virtual Resource Management

By Roderick Flores

Recently we explored the concept of a virtualized grid: a system where the computation environment resides on virtualized operating environments. This approach simplifies the support of the grid-user community’s specialized needs. Further, we discussed the networking difficulties that arise from instantiating systems on the fly including routing and the increased network load distributing images to hypervisor would create. However we have not yet discussed how these virtualized computational environments would come to exist for the users at the right time.



The dominant distributed resource management (DRM) products do not interact with hypervisors to create virtual machines (VMs). Two notable exceptions are Moab from Cluster Resources and GridMP from Univa UD.  Moab supports virtualization on specific nodes using the node control command (mnodectl).  However they are not created on the available nodes as needed.



Consequently grid users who wish to execute their jobs on a custom execution environment will have to follow this procedure:



  • Determine which nodes were provided by the DRM's scheduler.  If any of these nodes are running default VMs for other processes, these may need to be modified or suspended in order to free up resources;
  • Create a set of virtual machines on the provided nodes;
  • Distribute their computation jobs to each of those machines once they are sure they have entered a usable state;
  • Monitor computation jobs for completion; and
  • Finally,once you are certain the jobs are complete, tear down the VMs.  You may be required to restore any VMs that existed before you started.


Sadly, the onus is on the user to guarantee that there are sufficient images for the number of requested nodes.  They are also required to notify the DRM which resources it will take during the computational process.  If this is not done, additional processes could be started on the same node and resource contention could result.



In addition to the extra responsibilities put upon the grid user, they will also lose many of the advantages that resource managers typically offer.  There is no efficiency associated with managing the resources  of the VMs beyond their single use.  If a particular environment could be used repeatedly, that operation must be managed by the user.  Also, the DRM can only preempt the job that started the virtual machines and in turn the computational jobs.  If this process is preempted, then neither the computational job nor the VMs will be  affected.  If other jobs are typically run on a default VM, there could be issues.  Finally, the user may lose some of the more sophisticated capabilities built into the resource manager (such as control over parallel environments).



All of these issues could be solved by tightly integrating the DRM with the dominant VM hypervisors (managers).  The DRM should be able to start, shutdown, suspend, and modify virtual environments on any of the nodes under its control.  It should also be able to query the state of the physical machine and all of its operating VMs.  Ideally either the industry and/or our community would come to consensus on an interface that all hypervisors should expose to the DRM.  If we put our minds to it, we could describe any number of useful features that a DRM could provide when integrated with virtual machine managers; these concepts simply need to be realized to make this architecture feasible. 



Here are my thoughts about what a resources manager in a virtualized environment might provide:



  • It could be able to rollback an image to its start state after a secure process was executed on it.
  • It could be aware of the resources each VM were limited to so that it could most efficiently schedule multiple machines per physical node.
  • It should distinguish between access-controlled VMs versus public instances to which it may schedule any jobs.
  • It should stage the booting of VMs so that we do not flood the network by transferring operating system images.  A sophisticated DRM might even transport images to local storage before the node's primary resources are free.  Readers of the previous posts will recall that the hypervisor interactions should be on a segregated network so as not to interfere with the computational traffic. 
  • It could suspend VMs as an alternative to preempting jobs.  Similarly, it could suspend a VM, transport its image to another physical node, and restart it.  If the DRM managed output files as resources, it could prohibit other processes from writing to the files still open from the suspended systems.
  • It could run specialized servers for two-tier applications and modify the resource allocation for the VM should it become resource constrained.


I am sure that other grid managers could improve on as well as append this list with other excellent ideas. 



In summary, we have examined the flexibility that a grid with virtualized nodes provides.  As clusters evolve from dedicated systems for a homogeneous user community into grids serving a diverse set of user requirements, I believe that grid managers will require the virtualized environment that we have been exploring.   Clearly the key to creating this capability is to integrate hypervisor into our resources managers; without it, VM management is simply too complicated for the scale we are targeting.



Thus far, nothing that we have explored helps us manage and describe the dynamic system that this framework requires (as I am sure you have noticed).  Is this architecture a Frankenstein's monster that will turn on its creators?  That said, next time we will explore how we might monitor  and create reports for a system that changes from one moement to the next.   

No comments:

Post a Comment