Thursday, April 10, 2008

Will You Move the Jobs or the Nodes?

By Ignacio Martin Llorente

The history of High Performance Computing (HPC) can be told as a continuous search for the optimal use of the underlying hardware platform. As new architectures are delivered with impressive peak performance specifications, a parallel  effort is made by HPC community to exploit these new machines. This has been the case from the development of message passing frameworks in the early 80's to today's challenges of multi-core processors.



An important piece in this quest is the Distributed Resource Management System (DRMS). A DRMS can be roughly defined as a software component that provides an uniform and single view of a set of (possibly heterogeneous) computational resources. A DRMS can be found in any distributed system from tightly coupled MPPs to Grid or even Peer2Peer environments. The goal of the DRMS in any of these platforms is to find an optimal assignment (in terms of a given target like utilization or wall-time) between a computational workload and computational resources.



Virtualization has opened up avenues for resource management techniques e.g. server consolidation and isolation, custom execution environment provisioning... Although, probably the most exciting feature is its ability to dynamically shape a hardware infrastructure. The same set of hardware blades can be shaped to be a web server and a database server, or a set of cluster worker nodes for scientific computing, or a set of workstations for a virtual classroom...



In  this way, virtualization has completely changed the habitat of DRMS species. In fact, their goal can be reformulated the other way around: find the optimal computational resources to execute a given computational workload. Traditionally a DRMS assigns pending jobs to local resources, and lately with the advent of Grid Computing also to remote resources in other administration domains.



There are two alternatives today to extend the limits of a computing cluster:





The generalization of this second approach and its application to any kind of services requires of a new component class for resource management. This new component can be referred as Infrastructure Management System (IMS); there is no clear consensus on its name, virtual environment manager, virtualization manager or VM manager are sometimes used.  An IMS dynamically shapes a physical infrastructure by deploying virtual machines to adapt its configuration to the services it supports and their current load. Given the characteristics of VMs (flexibility, security, portability...), the IMS will be the key resource management component for the next generation resource provisioning paradigm, i.e. cloud computing.




Some examples include commercial products like IBM Virtualization Manager, Platform Orchestrator or VMware VirtualCenter, and open source initiatives like the OpenNEbula Virtual Infrastructure Engine. OpenNEbula allows a physical cluster to dynamically execute multiple virtual clusters, so providing on-demand resource provisioning because the number of working nodes can grow according to user demands so that there are always computing slots available.So the question is: are you going to move the jobs or the nodes?.



Ruben S. Montero

Tuesday, April 8, 2008

The Canon of Clouds

By Rich Wellner

The emergence of cloud computing as a resource on the grid has led to a huge resurgence in interest in utility computing. Looking at the history of utility computing allows us to identify three canonical interaction models that also apply to cloud computing.



  • Metascheduling
  • Virtual machines
  • Application virtualization


Metascheduling



Initial cloud offerings like Amazon Elastic Compute Cloud created the nomenclature around clouds. Going back before the term "cloud" was coined we see a similar offering from Sun with their utility computing offering. In both cases users submit work to the service and eventually get results returned. How the request gets prioritized, provisioned and executed is at the discretion of the service provider. In many ways this is similar to how a typical cluster works. A user selects a cluster, submits a job and waits for a response. What node is used to execute his request is largely out of his control. While acknowledging there are substantial difference between a cluster and a cloud, another similarity reveals itself when thinking about how users interact with compute resources in companies that operate multiple clusters.



As companies began adding additional clusters, users quickly demanded a facility to submit their jobs to a high level service that would manage the interactions with all the clusters that were available. Most users didn't want to have to themselves use multiple monitoring tools to access multiple clusters and use the information gathered to make a decision about where to submit their job. What they wanted was a single interface to submit jobs to and a service that would make policy based decisions about which cluster to ultimately submit the request.



The situation today is similar. Multiple cloud and utility computing vendors exist and users don't want to spend their time gathering information about the state of each in order to decide where to submit their jobs. Further, administrators and managers need to be able to enforce policy. There are several reasons for requiring this behavior, but probably the easiest to explain is that there are costs associated with resource usage at the cloud vendors and organizations require control over how that money is spent.



The answer to all these needs is to place a metascheduler between the users and the various resources. Users can then use a single interface for all their jobs regardless of where they are ultimately going to be executed.



[A metascheduler] enables large-scale, reliable and efficient sharing of computing resources (clusters, computing farms, servers, supercomputers…), managed by different LRM (Local Resource Management) systems, such as PBS, SGE, LSF, Condor…, within a single organization (enterprise grid) or scattered across several administrative domains (partner or supply-chain grid). -- GridWay


Virtual machines



Clouds are only as useful as the software running in them. Therefore, the next important interaction model is that between users and virtual machines.



Users often need very specific software stacks. This includes the application they are running, support libraries and, in some instances, specific versions of operating systems. Analysts are saying that there are now at least 35 companies addressing the needs of users in managing these interactions. This includes software to implement the enactment layer, manage images, policy engines, user portals and analytics functions.



One of the questions yet to be answered in the cloud community is how to allow users to make use of several clouds on a day to day basis. As this market continues to mature, look for many of the same challenges (e.g. security, common APIs, WAN latencies) that the grid community has been tackling for over a decade to become increasingly important to cloud users.



Application virtualization



In the context of clouds, application virtualization gains significant power by being able to add or remove instances of applications on demand. This is currently being done in the context of data center management using proprietary tools. Clouds present a cool new opportunity to do the balancing act on a regional basis. As more clouds are built and standard interfaces made available, users will be able to load balance to multiple clouds operating in different countries or cities as demand grows and shrinks.



These three models represent established, powerful interaction modes that are being used in production in a variety of settings today. It will be interesting over the next year to see which cloud operators adopt which models and how many lessons they take from existing non-cloud implementation versus trying to reinvent the wheel in a new way.

Saturday, April 5, 2008

Grid Engine 6.1u4 Release

By Sinisa Veseli

The Grid Engine project has announced a new maintenance release (version 6.1 Update 4) of its software. This release fixes some 50+ issues found in earlier versions. In particular, couple of problems causing qmaster crashes have been resolved, and so were several dbwriter and accounting issues. More specifically, if you were wondering why your array job task usage is missing from the accounting file, you should consider installing this release. The new version of the software is available for download here.

Monday, March 17, 2008

All Jobs Are Not Created Equal

By Sinisa Veseli

Choosing a distributed resource management (DRM) software may not be a simple task. There are a number of open source or commercial software packages available, and companies usually go through product evaluation phase in which they consider factors like software license and support costs, maintenance issues, their own use cases and existing/planned infrastructure, etc. After following this (possibly lengthy) procedure, and finally making the decision, purchasing and installing the product, you should also make sure that the DRM software configuration fits your cluster usage and needs. In particular, designing the appropriate queue structure, configuring resources, resource management and scheduling policies are some of the most important aspects of your cluster configuration.

At first glance devoting your company's resources into something like queue design might seem unnecessary. After all, how can one go wrong with the usual "short", "medium" and "long" queues? However, the bigger your organization is and the more diverse computing needs of your users are, the more likely it is that you would benefit from investing some time into designing and implementing queues more efficiently.

My favorite example here involves high priority jobs that must be completed in a relatively short period of time, regardless of how busy the cluster is. Such jobs must be allowed to preempt computing resources from other lower priority jobs that are already running. Better DRMs usually allow for such use case (e.g., by configuring "preemptive scheduling" in LSF, or using "subordinated queues" in Grid Engine), but this is clearly something that has to be well thought through before it can be implemented.

In any case, when configuring DRM software, it is important to keep in mind that not all jobs (or not all users for that matter) are created equal...

Tuesday, March 11, 2008

All of Your Data in One Basket

By Roderick Flores

I once worked with this person who wrote programs that only wrote to a single file. Once this program was put into the grid environment it would routinely create files that were hundreds of gigabytes in size.  Nobody considered this to be a problem because the space was available and the SAN not only supported files of that size, but also performed amazingly well considering the expectations. While this simplifies the code and data management, there are a number of reasons why this is not a good practice.



  • You don’t always need all of the output data at once. Moving a piece from the grid to your desktop for testing would not even be a consideration.
  • The amount of computation-time needed to recreate a huge file is significant.
  • There is no easy way to get to use multiple threads for writing and/or reading data.
  • Moving files across the network takes a lot more time.
  • A file can only be opened in read-write mode by one process at a time.  One large file is going to block a lot more modification operations than several single files.
  • Backing the file up is remarkably more difficult.  You cannot just burn it to a DVD so it has to be sent to disk or to tape.  If you need to restore a file it can take a significant amount of time.
  • Your file is going to be severely fragmented on the physical drives and therefore will cause increased seek times.
  • You can no longer use memory-mapped files.
  • Performing a checksum on a large file takes forever.
  • Finally, if you had properly distributed the job across the Grid, you should not have such large files!!!


Why would anybody do such a thing?  All your data are belong to us?

Wednesday, March 5, 2008

Four Reasons to Attend the Open Source Grid and Cluster Conference

By Rich Wellner

We're combining the best of GlobusWorld, Grid Engine Workshop and Rocks-a-Palooza into one killer event in Oakland this May. Here's why you should come to the Open Source Grid and Cluster Conference:



  • Great Speakers: We're going to have the rock stars of the grid world speaking and teaching.
  • Great Topics: Dedicated tracks to each of the communities being hosted.
  • Community Interaction: The grid community is spread all over the world, this will be a meeting place to get face time with the people you know by name only.
  • You Can Speak: We're currently accepting agenda submissions for 90 minute panels and sessions.
This should be a fantastic conference, I'll look forward to meeting you there.

Monday, March 3, 2008

Grid vs Clouds? Who can tell the difference?

By Sinisa Veseli

The term "cloud computing" seems to be attracting lots of attention these days. If you google it, you'll find more than half a million results, starting with Wikipedia definitions and news involving companies like Google, IBM, and Amazon. There is definitely no shortage of blogs and articles on the subject. While reading some of those, I've stumbled upon an excellent post by John Willis, in which he shares what he learned while researching the "clouds".



One interesting point from John's article that caught my eye was his regard of virtualization as the main distinguishing feature of "clouds" with respect to the "old Grid Computing" paradigm ("Virtualization is the secret sauce of a cloud."). While I do not disagree that virtualization software like Xen or VMware is an important part of today's commercial "cloud" providers, I also cannot help noticing that various aspects of virtualization were part of grid projects from their beginnings. For example, SAMGrid, one of the first data grid projects that served (and still serves!) several of Fermilab's High Energy Physics experiments since the late 1990's, allowed users to process data stored in multiple sites around the world without requiring users to know where the data will be coming from, and how will it be delivered to their jobs. In a sense, from physicist's perspective experiment data was coming out of the "data cloud". As another example, "Virtual Workspaces Service" has been part of the Globus Toolkit (as incubator project) for some time now. It allows an authorized grid client to deploy an environment described by the workspace metadata on a specified resource. Types of environments that can be deployed using this service range from atomic workspace to a cluster.



Although I disagree with John's view on the differences between the "old grid" and "new cloud" computing, I still highly recommend the above mentioned article, as well as his other posts on the same subject.