Wednesday, December 12, 2007

Proper Testing Environments

By Roderick Flores

It continues to amaze me how many businesses do not have tiered development environments.   Moreover, many of these same companies maintain a very sophisticated production environment with strict change management procedures.  Yet somehow they feel it is apt to keep inadequate staging environments. 



However, we know better: a truly supportive development environment, in the parlance of the agile-development community, must contain a series of independent infrastructures each serving a specific risk-reduction purpose. The idea is that by the time a release reaches the production environment, all of its desired functionality should have been proven as operational.  A typical set of tiers might include:



  • Development – an uncontrolled environment
  • Integration – a loosely controlled environment where the individual pieces of a product are brought together.
  • Quality Assurance – a tightly controlled environment that mirrors production as closely as possible.
  • Production – where your customers access the final product.



Check out Scott Ambler’s diagram on Dr. Dobb's for a supportive environment to see a logical organization of this concept.



So what happens if you cut corners along the way?   I am sure you all know what I am talking about.  Here are a couple of my past favorites (in non-grid environments):




Situation:  You combine quality assurance with the integration and/or the development environments.


Result: New releases for your key products inexplicably fail in production (despite having passed QA testing) because your developers made changes to the operating environment for their latest product.




Situation: You test a load balanced n-tier product on a small (two to three machines) QA environment.


Result:  The application exhibits infrequent but unacceptable data loss because updates from one system are overprinted by those from another.  This is particular onerous to uncover because the application does not fail in the QA environment.



Presumably, we grid managers do all that we can to provide test frameworks adequate enough to avoid problems such as these. There are many texts that discuss the best practices for supportive development environments. Unfortunately, I have found that many of us forget one of our core lessons: everything becomes much more complicated at grid-scales.  Consequently, we are perfectly willing to use a QA environment scoped for a small cluster.





In particular many of us prefer to limit our QA environments to a few computation nodes.  Thus we choose to run our load tests on our production infrastructure.  Conceptually, this makes sense: we cannot realistically maintain the same number of nodes in QA as are in production, so why keep a significant number when we will end up running some of our tests out there anyway?  Sadly, this approach severely complicates performance measures.



For example, assume that my test plans dictate that I run tests ranging from one to sixty-four nodes for a particular application.  If I run this in production, I am essentially getting random loads on the SAN, network, and even the individual servers to which I am assigned.  Consequently I have to run each individual test from the plan repeatedly until I am certain that I have a statistically significant sample of grid states.  Yet I have only defined my capacity on the grid for the average of the utilization rates during my testing.  Any changes to capacity on the grid such as a change in usage patterns or the addition of resources will invalidate my results.



Clearly, I need to run the application on a segregated infrastructure to get proper theoretical performance estimates.  The segregated infrastructure, like any QA environment, should match production as closely as possible.  However, in order to eliminate external factors that seriously affect performance, it is imperative that you use isolated network equipment as well as storage.  Another advantage of this approach is that we reduce the risk of impacting production capacity with a runaway job.  Similarly it takes a large number of test runs to produce numbers that hope to ignore current load factors and thus approach theory.    Obviously this may impact the grid users’ productivity.



As we noted earlier, we cannot justify a QA environment that is anything more than a fraction of production.  However I am certain that eight nodes is not enough. Certainly QA should contain enough nodes to adequately model the speed-up that your business proponents are looking for in their typical application sets.  It would not hurt to do some capacity planning at this point.  In absence of that, thirty-two computation nodes is the minimum size I would use for a grid which is expected to contain several hundred nodes.   



Finally, once we have a reasonable understanding of the theoretical capabilities of the application, then we should re-run the performance tests under production loads.  This will help us understand the lost productivity of our applications under load.  In turn this could help justify the expense of additional resources even if utilization rates cannot. 



I know you are asking, “how do I justify the expense of a large QA environment?”  Well, just think about the time you will save during your next major change to your operating systems and how you have to test ALL of the production applications affected before you migrate that change into production.  Would you prefer to do this on a few nodes, take several out of production, or just get it done on your properly sized test environment?

No comments:

Post a Comment