Tuesday, December 4, 2007

What You Need to Know About Cluster Express 3.0

By Ivo Janssen

At Supercomputing 2007, Univa UD launched Cluster Express 3.0 beta. If you were at SC'07, you might have attended one of my demos on Cluster Express, but if you missed it, then this blog post is for you. I will cut through the marketing speak for you and, as an engineer who worked on the CE3.0 release, tell you what Cluster Express can mean to you. 



Cluster Express is designed to be your one-stop-shop for a full cluster software stack. This means we bundle the scheduler, the security framework, the cluster monitoring and an easy installer that will configure everything out of the box. On top of that, the whole solution is open sourced, including all the code that Univa UD contributed to the stack. You can go to our new community at www.grid.org and download the CE3.0 beta and its sources right now.



So let's go through all the components in more detail.



Installer
Our installer is an very simple utility that will ask you less than 5 questions, after which it will go off and install the main nodes, the execution nodes, and any remote login nodes. It then will tie all these nodes together through a bootstrap service that is installed on the main node. This lets all the other nodes retrieve configuration information from the main node. The end result is that a fully configured cluster emerges, with sensible default configuration for the Grid Engine scheduler and the Ganglia monitoring, and all the certificates for security and authentication set up properly.



Scheduler
We bundle Grid Engine and the installer will configure all the nodes in such a way that after running the installer on an execution node, this node will be part of the cluster automatically, including sensible defaults for queues, and communication and scheduling settings.



Monitoring
We bundle, install, configure, and use various cluster monitoring tools such as Ganglia and ARCo and tie everything together in a custom Monitoring UI that we wrote and delivered as part of the CE3.0 release. The Monitoring UI is not a third-party bundled tool but really a new add-on to our solution. It brings together the system level statistics that Ganglia offers with the job level statistics that ARCo logs from Grid Engine. By presenting them together in one UI, you can cross reference jobs with the nodes that they ran on, and  the loads on that host. This will allow you, for instance, to instantly realize what the impact of running a job or task is on a certain nodes, in real-time and through an easy-to-use graphical UI.



Security
We bundle and pre-configure many Globus Toolkit components such as MyProxy, Auto-CA, RFT, WS-GRAM, GridFTP and GSI-OpenSSH. Auto-CA and MyProxy are completely configured out of the box, so that the only thing you need to do is a simple myproxy-logon to acquire a token that is valid for use with all the other Globus commands such as globus-url-copy or globusrun-ws. The level of integration that we accomplished for all the GT components will definitely impress you, especially if you've been a Globus user before.



Putting it all together
As said, the full bundling of all the above mentioned components in a tarball with an easy-to-use installer now makes setting up a fully featured cluster as simple as downloading one file and running one command. This is really as easy as we could make it! And  on top of that, everything is open-sourced, including our own add-ons such as the installer and configuration scripts, and the Monitoring UI.



I hope that I can welcome you soon on our new community website around Cluster Express at www.grid.org. You can download the CE3.0 tarball there, and participate in forums, add to our wiki, or get support through our mailing lists.



I'm user "Leto" on grid.org, please don't hesitate to send me a private message there if you need any help at all.

No comments:

Post a Comment