Pushing System Performance with Stress Testing
In performance testing, you hear people talking about “stress testing” the system to make sure it performs correctly. The challenge comes when you ask them what exactly they mean by “stressing” the system. In performance testing, the term “stress” can have several possible meanings and can represent several different types of stress.
In a typical performance test using an operational profile to place load on the system, the focus is on identifying resources that begin to reach their maximum usable capacity within the system at varying load levels (volume). In load testing, if you were running an average load test and memory utilization began to approach 75 percent to 80 percent, memory is now under “stress,” as it is becoming a bottleneck on the system. Increasing load on the system will push this resource beyond acceptable stress limits.
Most performance tests are about looking for these types of situations. At this point of stress on this resource, you have also reached the current acceptable load level for the system. These stress levels are a problem if they occur at points below the typical average or peak load levels the system is expected to handle.
A second common type of stress we address in performance testing is pushing the load levels beyond a normal peak load to see how much capacity the system has. This is often done as part of capacity planning or scalability testing. The system is tested incrementally beyond its normal peak load level to see at what level any resource might “stress” and prevent the system from adding capacity beyond that point. Knowing the maximum load level and the growth rate of the system, you can calculate how long the system will continue to operate within acceptable parameters.
Another function of the capacity-planning aspect is seeing how the system behaves should the load level push a resource beyond its maximum level of sustained usage. Does the system crash, or does it just stop responding? Pushing a system to the failure point using load would not make much sense in a UI-based application. Once a resource approaches maximum usable capacity, response time will become so long that users will naturally abandon the system, thereby reducing overall load.
If the goal is to test the recovery capabilities of the system after a failure, a more effective method than stress testing would be to run a normal to average load test, cause the system to fail, and assess the recovery. With a lower level of activity, it will be much simpler to determine if the recovery process worked correctly.
However, pushing embedded systems to the point of failure may make perfect sense. In an embedded system, having one subsystem fail may cause a chain reaction in other subsystems and result in a general system failure. Testing to ensure a failed subsystem does not generate a cascade failure is quite useful.
All performance testing is related to some form of stress. The key is knowing which stress tests make sense in your situation.