Tuesday, September 25, 2007

Why Are 92% Of Users Waiting for I/O?

By Rich Wellner

stoplight_000004020985XSmall.jpgIDC recently published a finding that 92% of users have applications that are I/O constrained. That's a shockingly large number given the options that exist for reducing this pain. Let's break this down into three major categories:



  • Working storage
  • Near-line storage
  • Long-term storage


The nature of each of these domains is very different and the options available to reduce the problems are similarly different.



Working Storage



Administrators unfamiliar with the demands of certain classes of applications will sometimes mount their scratch disk as (oh, the horror!) NFS. Over the past five years the average knowledge level has crept up as the community has grown and gained experience, but I have seen in the last couple months that there are still clusters out there than are doing significant I/O to slow network scratch.



Near-line Storage



Typically consisting of disk, NAS or SAN devices people have a tendency to not buy enough bandwidth (either in the form of network I/O or aggregate disk I/O) or view the purchase of these facilities as one time expenses, failing to keep up with their users expanding demand as time progresses.



Long-term Storage



Migrating data to tape is the only game in town for long term, high capacity storage (so far the decade old promise of using disk for long term storage still seems to be a decade away). The problem is that with drives and automated libraries costing enormous amounts of money, coupled with the latencies inherent in this type of storage, applications are left sitting tapping their feet waiting for data to stream in.



The Solution



The services in the grid must be programmed to be aware of the data they require. An early example of this is the DDM system in incubation in dev.globus. This system knows what data is located in different resources on the grid and can thus be integrated with workflow and scheduling systems to pre-stage data to working storage before the application is started. This completely eliminates two of the three I/O constraints, and the last one is the easiest and cheapest. Just stop mounting your scratch on NFS...

No comments:

Post a Comment