The second part of our blog posting series shows an overview on the different types of performance testing. Learn more about load testing, scalability testing, stress, spike and soak testing, configuration testing as well as availability and resilience testing. The article is based upon a talk I gave at the AWS PopUp Loft, DevOpsCon 2016 and other occasions.
Types of Testing
In the last blog post I wrote about performance, scalability and the importance of performance testing in the cloud era. So, now, let us take a closer look at the performance testing methods.
It is needless to say that some of the types of testing are not really special when it comes to a cloud environment, but others are especially interesting.
Load testing is sort of the simplest form of performance testing. You induce a normal or expected workload to a system under test and observe it. You can use load tests to determine general system behavior, latency and throughput. In general load tests are used to verify your quality criteria.
Stress testing is basically a load test, but we are applying a higher-than-expected workload and see how the system behaves under serious stress and when exceeding the design limits. You want to learn when your system breaks and how it starts to fail when being in a serious traffic situation.
A typical approach is to steadily increase the load to see where the system under test begins to violate its non-functional requirements. You can use this "tipping point" to describe the capacity of the given system, like "we can handle 1000 concurrent users per application server before we start to violate our quality requirements".
With scalability testing you are changing the perspective to answer the question: How effective can I grow? You can run a series of stress tests and gather data on how effective you really are.
Using stress tests in a series where you steadily increase the system's ressources, you can easily tell if your system can translate this into additional capacity.
Knowing how well and how far your system will scale by adding resources, you can now make an informed decision: You might not need to do anything right now or you need to take action (e.g. remove bottlenecks). Or you simply add more resources to your problem to mitigate a scaling issue for the time being. Suffice to say, that this is a basic requirement for capacity planning and to do cost estimation.
Spike testing can be used to determine how well your system can cope with sudden traffic spikes. It is comparable to a load or stress test, but modeled as a sudden burst of traffic. It can be a good preparation for a planned marketing campaign or an unplanned event like being featured on Reddit or Hacker News. Spike Testing can tell you if you are making good use of the elasticity of the cloud when faced with these kinds of events.
Soak testing, again, is basically a load test where you hold the load over longer periods of time to look for long term effects, like memory leaks, disk space filling up, etc. The duration of a soak test depends on your situtation. Usually a soak test runs for several hours.
While load, stress, spike and soak testing are not particular special when it comes to the cloud, the next testing method is one of the most interesting ones: Configuration Testing.
The perspective shifts now to looking at the changes in performance if the configuration is modified. The change might be positive, but also negative in case you want to optimize for costs (remember: Performance is resource usage per unit of work, money is also a resource). The point is that you know and can quantize the change.
Configuration can be almost anything here: your environment, services that you are using, dependencies of your software — all can be seen as configuration.
Configuration testing is, for obvious reasons, a very important tool in order to learn about the impact of a system's environment to its performance. It is always a series of test runs where you compare and analyze the impact of multiple configurations.
Now, it starts to get really interesting: We have a lot of configuration options to choose from when it comes to the cloud. These options include:
- Instance types selection (for EC2, RDS, EC, ES)
- Auto scaling configuration (Scaling Policies, Instance Launch Time, Scaling Lifecycle)
- Throughput provisioning (EBS IOPS, DynamoDB throughput, Kinesis bandwidth)
- Service usage optimization (ELB pre-warming, Index Usages)
Finally, you have approached the parts that you have more or less fully under your control, like the operating system, network stack and other kernel settings, software/web server/app server configuration, dependencies, etc. etc.
To emphasize this once more: The aim is to look for change in performance. So, you can use configuration testing to optimize for costs as well, while you know what kind of trade-off you are taking.
Availability & Resilience Testing
The last type of performance testing I will introduce is availability & resilience testing. I will only briefly touch this, because this is probably enough for one single blog article. The idea is to look at certain processes and behavior under load and check if you have covered this.
- What about deployments? Possible even with DB migrations under load?
- Ever changing, ever evolving infrastructure? Automatic scaling environments?
- Failure scenarios and failover mechanisms?
Principles of Chaos Engineering
The idea to give this its own testing category is inspired by Principles of Chaos Engineering. You most probably have checked for those points in some form of manual or automated functional testing, but did you also check for them under load? While you aren't all Netflix and can't do that in production, at least think about to do that in a test environment using artificial traffic.
Want to know more? Take a look at the previous article:
And here you find the third part of our blog posting series:
Find the slides of the DevOpsCon Berlin 2016 talk [DE] here: