Having had the opportunity to support the grandcentrix team in May to help Quizduell im Ersten, I was happy to once again be called to assess another interactive TV show with a 2nd screen app.
For RTL interactive I tested a sophisticated backend for an upcoming casting show. The backend was accessed through their RTL Inside app (iTunes Store, Google play). This opportunity presented me with a very interesting scenario and in this post I would like to outline how their system was designed and how I was able to test the performance and scalability of the architecture.
"We were actually quite relaxed during the show premiere. Everything worked
flawlessly and our complex workflows performed pretty well. Sebastian provided a
great deal of help with his professional guidance and the extensive load tests!
This brought us very close to an optimal situation, where the going live is not
the first-time peak-traffic for the architecture we designed. Load testing from
a professional third party is a major success factor for a software project and
should be on everyone's top priority list."
— Jérôme Patt, Project Manager at RTL interactive
Please note, that for obvious reasons, I'm not allowed to mention any further specific details. Suffice to say RTL interactive (RTLi for short) was pleased with the findings and all the systems tested performed well in production. Please direct your inquires regarding RTL interactive, RTL, or the mentioned show to RTL's press office. If you have a similar project or any questions regarding performance, load or scalability testing, please feel free to get in contact with me!
The show format is that of a casting show (otherwise known as a talent show). Several acts are introduced to the audience and they are set to perform behind a giant set of tv-screens.
The unique feature of this casting show is, that the viewers at home are also the jury. While a show act performs on stage, they (the audience at home) can decide with their vote whether they like the particular act or not. The big display wall fills up with pictures of viewers who voted in favor of this act. While the voting progresses the audience's appreciation is shown and once the act reaches 75% before the performance is over, the wall is lifted and the act may continue to the next round.
Before viewers can participate, they have to download and install the RTL Inside app and register to be a show juror. Optionally they can then choose to
- upload a photo,
- use their Facebook picture or
- don't want to have a picture at all.
For every show act there are two interaction steps for the viewers at home:
Check-In: In the check-in phase viewers have to signal if they are going to vote for the next act or not. The user check-in is required and is then used to calculate the total percentage value. When the check-in phase is over, and the act starts to perform, the voting phase begins.
Vote: During the voting phase, the viewer uses the RTL inside app to signal if he or she likes the current act or not.
Needless to say that an interactive show format like this running on RTL has strict quality and performance requirements. Beside the potential reach of such a show, the synchronized check-in and voting phases were expected to produce a significant number of requests at their peak.
About 2,000,000 jurors signed up using the "RTL inside" app, resulting in 7,400,000 positives votes and 9,900,00 check-ins.
— RTL press release, roughly translated from German.
Background & Architecture
The backend systems were developed in-house at RTL interactive. I had the opportunity to work very closely with the development and operation teams and was impressed by the chosen architectural approaches.
RTLi decided to run the show's backend systems on Amazon Web Services (AWS). They committed to an interesting hybrid approach using many AWS managed services together with custom services build on top of Amazon ECs.
The RTL Inside app had a special area created for the show where an in-app browser is used to make up the user interface. The app communicates directly (authenticated via AWS Signature Version 4) with various AWS services, such as: Amazon S3 and Amazon SQS. This allows for the system to scale almost automatically without having to deal with auto scaling nor anything other complicated moving parts. Other components were handled by services running on Amazon ECs and 3rd party content delivery networks.
Performance, Load & Scalability Testing
When launching a high profile interactive TV show format like this, you have to test relentlessly and extensively before you go live. Despite the fact that the RTLi team put great effort into making the optimal use of the scaling properties of AWS, dynamic performance and scalability testing is always mandatory. Results and findings of our load testing profile were also used as a basis for the capacity planning process.
The system itself was very nice to test, since the majority of interactions happened through AWS APIs. The architecture makes heavy use of Amazon SQS internally which results in a highly decoupled environment. As a result, most parts could be tested one after another in isolation which is always a great testing property — especially when you aim for very high throughputs and service quality. Another great aspect is that while the team was investigating a finding from previous test runs, the other system components could be tested independently, thus cutting down on time invested.
Modelling Test Cases
Several test scenarios where modelled: From simple cases to test single components to complete and comprehensive test cases which interact with all APIs like a user would do through the app: sign up, photo upload, state polling, check-in, vote.
Since almost all test cases had to interact with AWS, I had to implement the AWS Signature Version 4 calculation algorithm in an efficient as possible manner. Since the signature is based on the request (e.g. payload), pre-computation was not an option, as every user has its own unique API keys, authentication and device tokens. Efficiency was important because we want to test with a lot of users all of which generating a good number of concurrent requests!
Some services were developed and operated by RTLi using Amazon ECs and Amazon Elastic Load Balancing. Needless to say that they had to be thoroughly tested as well. General service quality, performance and scalability were the primary testing goals, besides system stability in edge case and desaster scenarios. Having a solid basis for capacity estimation and understanding the scaling properties of those systems and services was very important too.
Despite the testability of system components in isolation, we also performed extensive end-to-end tests to assess the entire process chain, spanning from processes running on Amazon ECs, background workers consuming Amazon SQS messages down to data feeds for the TV studio and other administrative dashboards reading data from various metric systems and Amazon DynamoDB. In addition to client facing APIs there were several background systems involved e.g. to process votes and to aggregate and analyze data.
The extensive end-to-end tests served several goals:
- testing component integration under load
- gathering data for capacity planning
- performing scalability analysis for services running on Amazon ECs
- assessing the system stability under stress and peak workloads
- ensure that service quality requirements are fulfilled
On a related note: AWS also always recommends testing architectures build with AWS services in a proof of concept approach. As always, be aware to give your provider a heads up when you run large scale load tests! :) For AWS this could mean to let them pre-warm Amazon Elastic Load Balancers, change the partitioning of Amazon S3 buckets for very high throughput or increasing all kinds of account limits. In case you are in doubt, reach out to AWS Support.
We had a number of important findings which could all be addressed prior to the first show and I was told that all systems worked flawlessly in production. Together with the excellent AWS Enterprise Support, the RTLi team and I were able to pinpoint unexpected effects and conduct root cause analysis for strange latency impacts and service behavior we were seeing during tests.