“AWS Well-Architected Framework” is a guideline for review and improvement on cloud-based architectures. Amazon Web Services recommends “general design principles” and describes best practice examples.
In this blog posting we sum up some advices on performance testing.
First let's start with an explanation of the general design principles suggested by AWS followed by a closer look at some of the so-called “five pillars” of the AWS Well-Architected Framework (p. 2). Mentioning “five pillars”: In a former blog post from October 2015 the author Jeff Barr counts only four pillars. So, the here not mentioned pillar “operational excellence” seems to have gained more importance over the time.
Security, Reliability, Performance Efficiency, Cost Optimization & Operational Excellence – Five Pillars of the AWS Well-Architected Framework
In their definition of a Well-Architected Framework AWS speak about how they help customers to make “architectural trade offs as your designs evolve” and about the effects and learnings on performance after deploying into live environment (p. 1). Based on this they created the AWS Well-Architected Framework – a “set of questions you can use to evaluate how well an architecture is aligned to AWS best practices” (p. 2). Five topics (“pillars”) are defined: security, reliability, performance efficiency, cost optimization, and operational excellence1 (p. 2). AWS describes that customers have to “make trade-offs between pillars based upon your business context” (p. 2) when they improve their architecture. For example
the optimization of performance efficiency has crucial effects on your cost efficiency.
So, it's recommended to do performance analysis and optimization of performance bottlenecks on a regular basis to gain effect on e.g. cost efficiency.
General Design Principles
The definition of the AWS Well-Architected Framework is followed by a bunch of general design principles “to facilitate good design in the cloud” (p. 2). We pitch on three points concerning load and performance testing. The first one deals with testing systems at production scale:
In the cloud, you can create a production-scale test environment on demand, complete your testing, and then decommission the resources. Because you only pay for the test environment when it is running, you can simulate your live environment for a fraction of the cost of testing on premises. (p. 3)
AWS make it easy and affordable for you to test in the cloud. You can run your tests without a huge increase of costs.
Another point is AWS’s suggestion to do improvements through “game days”:
Test how your architecture and processes perform by regularly scheduling game days to simulate events in production. This will help you understand where improvements can be made and can help develop organizational experience in dealing with events. (p. 3)
An event might be the transmission of a newsletter, any other promotion or a business related event like the launch of a new website or product. In this case you can make use of spike testing which is comparable to a load or stress test. If you are curious about the different types of testing check out our blog post about types of performance testing. One more appreciable point deals with allowing for evolutionary architectures:
(…) In the cloud, the capability to automate and test on demand lowers the risk of impact from design changes. This allows systems to evolve over time so that businesses can take advantage of innovations as a standard practice. (p. 3)
It is common to introduce new features or a new technical product to your architecture, test it in an automated, cloud based environment and learn about the performance impact of the introduction. Especially test automation is one of StormForger’s major topics as we deliver all the tools you need to do continuous load testing in the cloud.
Another pillar that is interesting when you consider load testing:
"The Operational Excellence pillar includes operational practices and procedures used to manage production workloads. This includes how planned changes are executed, as well as responses to unexpected operational events.” (p. 33).
As mentioned before in this article spike testing or stress testing are powerful ways to analyze the effects that planned or unplanned events may cause. AWS advise their customers to document, test, and regularly review “[all] processes and procedures of operational excellence” (p. 33).
The white paper actually speaks about the “Performance Efficiency” pillar which is a bit redundant, as performance actually means resource efficiency.
On page 20 the white paper asks the question: “How do you select the best performing architecture?”. AWS offers various kind of support and services to help answering these questions and help you to establish a good working architecture. But there is one thing to keep in mind: “data obtained through (…) load testing will be required to optimize your architecture” (p. 20). Of course, we totally agree: You just can’t avoid load and performance testing!
Depending on your workload various resource types and sizes can differ to fit your performance requirements. With a defined test case it is easy to rerun the same test case against different infrastructure setups and gather needed data and learnings.
Deploy the latest version of your system on AWS using different resource types and sizes, use monitoring to capture performance metrics, and then make a selection based on a calculation of performance/cost. (p. 53)
What AWS is suggesting here is what we call Configuration Testing. Configuration Testing does not only cover your software configuration, but also your entire environment. Rerunning of performance tests is part of a review routine to “ensure that you continue to have the most appropriate resource type” (p. 56). Approaches change, new technologies develop – use these progressions to refine your architecture and improve its performance.
Load testing is also necessary in terms of using trade-offs to improve your performance2: “Trade-offs can increase the complexity of your architecture, and require load testing to ensure that a measurable benefit is obtained.” (p. 25)
It all boils down to: Performance Testing is not a one hit wonder. Continuity is key to success and you should be setup for doing performance tests whenever needed. In general you should start to test early and do it on a regular basis. If possible integrate performance analysis in your development and testing processes, ideally alongside your functional tests in your continuous integration systems.
AWS released a fine and valuable white paper worth reading, a practical guideline to design and steadily improve a well-architected framework. They offer a lot of useful services you might check out. Most of the targeted problems we see in the wild on a regular basis and speak out the same recommendations.
We always recommend to start early with small, easy and understandable test case scenarios. It is easy to setup your first load test with StormForger for free. Sign up, learn more in our documentation and get in touch with us to get a personal on boarding.
So, the here not mentioned pillar “operational excellence” seems to have gained more importance over the time.
Focused on space-time trade-offs for performance efficiency. ↩