By Ivo Janssen
Ask a user why they use a grid, a cluster, or any other type
of distributed system and you’ll hear, “Why, to get my work done faster, of
course.” But that’s an ambiguous statement at best, since it can mean two things:
faster runtimes or higher throughput. And although they might seem similar,
they’re really not.
Runtime is defined as the wallclock time it takes to
complete one task. If you parallelize a task, for instance with MPI, or by
taking advantage of the data splitting capabilities of Grid MP, you can get
your job back in less time. If you can parallelize your job into 10 parallel
sub-jobs and run it on 10 nodes, you can expect that job to complete on average
in 1/10th of the time. Plus a bit of overhead of course, but let’s keep it
simple for now. In Volvo’s innovative Uddevalla
plant, groups of workers assemble entire automobiles in less time than it takes
for one worker to complete a whole car. So with 10 workers in a group, you
could potentially make a car in 1/10th of the time.
However, sometimes your task cannot be parallelized any further,
but you might have lots of them pending. Grids can still help since they can
increase the throughput of your jobs. Queuing theory states that with 10 nodes
and 10 jobs, you can still expect a unique job to complete on average in 1/10th
of the runtime of a single job, without using any parallelism. In a traditional
American automotive plant, the car advances on the assembly line and at no
point more than one operator is working on one car, so there’s no parallelism
involved. It might take up to a day before one car is completed from start to
finish, but a new car rolls off the end of the line every few minutes.
So next time when a user brags about his fancy new cluster,
ask him whether he’s producing Fords or Volvos.
The difference between the two examples is not that one is parallel and the other not, but rather one is fine grain parallelism and the other course grain. The Ford assembly line is just the standard master-slave model.
ReplyDeleteI personally draw the line between parallelism and batch differently. As soon as one part of the calculation doesn't make any sense, I call it parallelism. Producing one Ford makes sense, since it completes one full car. Having one Volvo team member put in just the engine while no seats are installed does not.
ReplyDeleteThe master-slave analogy only relates to the dispatching of the job, which has nothing to do with its inherent parallelism.
"I personally draw the line between parallelism and batch differently. "
ReplyDeleteThis sounds like a pretty arbitrary distinction.
"As soon as one part of the calculation doesn't make any sense, I call it parallelism. Producing one Ford makes sense, since it completes one full car. Having one Volvo team member put in just the engine while no seats are installed does not."
I cannot parse this. What does it mean "to make sense"? I don't see why the Volvo team member would not put the engine in before the seats are installed. Perhaps it doesn't make sense to install the engine before the frame is built but that's why its fine grain parallelism.
"The master-slave analogy only relates to the dispatching of the job, which has nothing to do with its inherent parallelism."
The point of the master-slave is that you have a big job that you split up among workers (who don't need to talk to each other)and then put it back together. This is the Ford assembly line. Think of the conveyor belt as the master, each slave is a worker that does one job, then the master takes the product at goes to the next one stop. The master's job is also to make sure each slave has work. Whether you acknowledge it or not this is a model of parallel work, pick up any standard book on parallel computing and this will be the first model they teach.