Technical and Scientific Computing with Grid Engine: Hookin' Up is Hard to Do

By Roderick Flores

Previously we discussed the tension that grid managers face when supporting various stakeholders on an enterprise grid. In particular we concluded that providing isolated virtual operating environments to each of the business units operating in your environment would be the easiest way to meet their competing and divergent needs. In this post we will explore the networking challenges that a grid of virtualized systems poses.

The primary challenge you face in this architecture is how to connect it all together. At first glance it seems simple enough: take your current grid, install a hypervisor on each of its nodes, and then start implementing your user’s specific environments. Sadly, this will probably not work.

In a typical grid you already have to consider the challenges of connecting several hundred compute nodes to one another and a storage network while keeping network latency low.

In order to illustrate the networking problems you would have in a virtualized grid, consider a system with a significant number of nodes used by several operational units. For example, imagine a large financial services company that provides banking, brokerage services, insurance, mortgage, and financing. Each of these business lines, while related, has their own distinct set of business application workflows. While there may be some overlap of the specific applications used by each of the units, there is little guarantee that each group will use those applications in the same way let alone use the same versions. Worse yet, a business unit may have multiple operational workflows which do not operate in similar environments (e.g. windows versus Linux specific applications suites). Finally, we grid managers would like to have development, test, and production instances segregated but running on the same hardware .

It is easy to project having to support at least ten times more virtual than physical operating environments. The actual number should be proportional to the number of unique operating environments required by the users. In a standard grid you have a fixed set of computational resources that are reasonably static; in other words systems do not appear and disappear on a regular basis. However in the virtualized grid, operating environments are going to appear and disappear as a function of the business workflows scheduled by your users. You can imagine how quickly this can become complicated.

What is the best way to deliver these operating environments to the physical hardware? If we keep all of the images on local disk then we need to guarantee that there is sufficient disk space on each node; a practice which not only can be costly but does not scale well. If we choose to keep no more than the maximum number of nodes supported by any application in each operating environment, we can reduce the number of virtual machines we require. Of course this implies that these images are either stored on a SAN or are transported to the individual physical nodes before booting the virtualized environment. Sadly, both of these approaches significantly increase network loads. We will discuss scheduling and managing individual virtual machines in subsequent posts.

How do we connect these virtual environments? If these systems were on segregated physical hardware (think Microsoft Windows versus Linux) we would likely keep them on their own network and/or VLANs. After all, these environments generally should not interact with one another. Consequently, shouldn’t we also do this for the virtualized grid? If we chose not to and instead used DHCP based upon physical topology to provide addresses to the virtualized environments, we could quickly run into trouble. Specifically, a single job executed on n nodes could conceivably land on n distinct networks and/or VLANs. This would significantly increase the size of the broadcast domain as well as require more work from your network switches. Therefore it would add significant latency to all communications between the nodes. Clearly this is a poor choice unless you are always using most of your nodes for each job.

Thus my preferred solution is to segregate operational environments, so that every physical node bridges traffic for several distinct networks over the same interface. Addresses would be assigned by virtual MAC addresses rather than physical location. As in the counter-example, this occurs because we will not be able to guarantee where on the physical network topology a particular job is scheduled. In fact, we probably want to use VLAN tags on our packets so that our switches could more efficiently operate. Additionally if your grid nodes have secondary interfaces, all communication with the hypervisor should be segregated to its own management network.

If this has not scared you away from the concept of the virtualized grid (I hope it hasn’t), we will continue to explore other hurdles inherent with this architecture in future posts.

Technical and Scientific Computing with Grid Engine

Wednesday, November 7, 2007

Hookin' Up is Hard to Do

No comments:

Post a Comment