Technical and Scientific Computing with Grid Engine: Breaking Out of the Core

By Roderick Flores

I think that one of the most exciting consequences of the rise of multicore is the possibility of overcoming the limitations of the WAN by processing where you collect your data. It is exceptionally difficult and/or expensive to move large amounts of data from one distant site to another regardless of the processing capability you might gain. Paul Wallis has an excellent discussion about the economics and other key issues that the business community faces with computing on "The Cloud" in his blog Keystones and Rivets.

So how do cores help us get passed the relatively high costs of the WAN? The first signs of this trend will be wherever significant amounts of data are collected out in the field. Currently you have a number of options, none of them great, for retrieving your data for processing. These include:

Provision the bandwidth required to move the data, typically at significant cost.
Significantly reduce the size or quality of the data and transmit it more affordably.
Write the data to media and collect it on a regular basis

There never really was much consideration given to processing the data in situ because the computational power just was not there. Multicore processors have allowed us to rethink this.

For example, consider one of the most sought after goals in a hot industry: near-real time monitoring of a reservoir for oil-production and/or for CO₂ sequestration. (see the Intelligent Oilfield, IPCC Special Report on Carbon dioxide Capture and Storage) The areas where this is most desired tend to be fairly remote such as offshore or in the middle of inhospitable deserts. There is no network connectivity to speak of to these areas let alone enough to move data from a large multi-component ocean-bottom seismic array like those found in the North Sea.

Consequently, a colleague of mine and I were tasked with how we might implement the company’s processing pipelines in the field. Instead of processing the data using hundreds of processors and an equivalent number of terabytes of storage everything needed to fit on ***maybe*** as much as a single computer rack. Our proposal had to include power conditioning and backup, storage, processing nodes, management nodes (e.g. resource managers), as well as nodes for user interaction. Electrical circuit size limitations also limited our choices. Needless to say, 30-60 processors just was not enough capacity to seamlessly transition the algorithms from our primary data center. The only way it could be done was by developing highly specialized processing techniques: a task which could take years.

Now that we are looking at 8 cores per processor with 16 just around the corner everything has changed. Soon, it will be possible to provision anywhere from 160-320 processors under the same constraints as before. It is easy to imagine another doubling of this shortly thereafter. Throw in some virtualization for a more nimble environment and we will be able to do sophisticated processing of data in the field. In fact, high-quality and timely results could alleviate much of the demand for more intensive processing after the fact.

Who needs the WAN and all of its inherent costs and risks? Why pay for expensive connectivity when you could have small clusters with hundreds of processors available in every LAN? If remote processing becomes commonplace because of multicore, we might see the business community gravitate towards the original vision of the Grid.

Technical and Scientific Computing with Grid Engine

Wednesday, February 27, 2008

Breaking Out of the Core

1 comment: