Tuesday, April 8, 2008

The Canon of Clouds

By Rich Wellner

The emergence of cloud computing as a resource on the grid has led to a huge resurgence in interest in utility computing. Looking at the history of utility computing allows us to identify three canonical interaction models that also apply to cloud computing.



  • Metascheduling
  • Virtual machines
  • Application virtualization


Metascheduling



Initial cloud offerings like Amazon Elastic Compute Cloud created the nomenclature around clouds. Going back before the term "cloud" was coined we see a similar offering from Sun with their utility computing offering. In both cases users submit work to the service and eventually get results returned. How the request gets prioritized, provisioned and executed is at the discretion of the service provider. In many ways this is similar to how a typical cluster works. A user selects a cluster, submits a job and waits for a response. What node is used to execute his request is largely out of his control. While acknowledging there are substantial difference between a cluster and a cloud, another similarity reveals itself when thinking about how users interact with compute resources in companies that operate multiple clusters.



As companies began adding additional clusters, users quickly demanded a facility to submit their jobs to a high level service that would manage the interactions with all the clusters that were available. Most users didn't want to have to themselves use multiple monitoring tools to access multiple clusters and use the information gathered to make a decision about where to submit their job. What they wanted was a single interface to submit jobs to and a service that would make policy based decisions about which cluster to ultimately submit the request.



The situation today is similar. Multiple cloud and utility computing vendors exist and users don't want to spend their time gathering information about the state of each in order to decide where to submit their jobs. Further, administrators and managers need to be able to enforce policy. There are several reasons for requiring this behavior, but probably the easiest to explain is that there are costs associated with resource usage at the cloud vendors and organizations require control over how that money is spent.



The answer to all these needs is to place a metascheduler between the users and the various resources. Users can then use a single interface for all their jobs regardless of where they are ultimately going to be executed.



[A metascheduler] enables large-scale, reliable and efficient sharing of computing resources (clusters, computing farms, servers, supercomputers…), managed by different LRM (Local Resource Management) systems, such as PBS, SGE, LSF, Condor…, within a single organization (enterprise grid) or scattered across several administrative domains (partner or supply-chain grid). -- GridWay


Virtual machines



Clouds are only as useful as the software running in them. Therefore, the next important interaction model is that between users and virtual machines.



Users often need very specific software stacks. This includes the application they are running, support libraries and, in some instances, specific versions of operating systems. Analysts are saying that there are now at least 35 companies addressing the needs of users in managing these interactions. This includes software to implement the enactment layer, manage images, policy engines, user portals and analytics functions.



One of the questions yet to be answered in the cloud community is how to allow users to make use of several clouds on a day to day basis. As this market continues to mature, look for many of the same challenges (e.g. security, common APIs, WAN latencies) that the grid community has been tackling for over a decade to become increasingly important to cloud users.



Application virtualization



In the context of clouds, application virtualization gains significant power by being able to add or remove instances of applications on demand. This is currently being done in the context of data center management using proprietary tools. Clouds present a cool new opportunity to do the balancing act on a regional basis. As more clouds are built and standard interfaces made available, users will be able to load balance to multiple clouds operating in different countries or cities as demand grows and shrinks.



These three models represent established, powerful interaction modes that are being used in production in a variety of settings today. It will be interesting over the next year to see which cloud operators adopt which models and how many lessons they take from existing non-cloud implementation versus trying to reinvent the wheel in a new way.

No comments:

Post a Comment