On May 12th the new and innovative show Quizduell im Ersten started on German national television (ARD). The show is an adaption of the very popular mobile game Quizduell which has a user-based of over 30 million users, so the TV show was expected to be very popular.
I had previously blogged about the specific challenges faced when load testing such an application. This blog prompted the makers of Quizduell im Ersten to approach me to assist in dealing with the specific problems. This post is a summary and translation of both articles.
I was approached by grandcentrix as an external load testing service provider. Grandcentrix had published a German FAQ which is no longer available) on their company blog about what the challenges were and what went wrong. Most information in this article is based on these FAQs. Please note, that for obvious reasons, I'm not allowed to explicitly mention any specific detail. Suffice to say they were pleased with the results.
"Sebastian was of tremendous help to set up and conduct the required large scale load tests. To be able to watch our Mobile Mass Response platform under load, was more than helpful in those stressful days […]"
I won't go into detail on the game mechanics of Quizduell im Ersten (rather check that out for yourselves). One of the big technical challenges was the achievement of the so called "TV synchronicity": Enabling an interactive, real time game experience with the TV audience via the Quizduell App at home. Depending on the game state, every 1 to 10 seconds the App is communicating at least once with the backend API. This means that (at peak) the requests per second correspond with the total number of users playing the game.
In the above mentioned FAQs, grandcentrix explains where the essential problems were. Parts of the complexity and the requirements were removed and I was asked to conduct load tests to back the refactoring efforts. In addition to the originally tested 85,000 requests per second, we hit the system with over 330,000 requests per second which correlates to about 1 million Quizduell players.
I worked very closely with the grandcentrix team on quality assurance and we conducted lots of rather large scale stress tests. In this article I'd like to outline a few more details about what we did.
The test cluster consisted of up to 50 AWS EC2 instances with 800 cores, 1.5 TB RAM and with well over 50GBits bandwidth. The setup was chosen to rule out any effects due to overloading the testing system.
The test case was modeled in a way, that test clients actually play Quizduell. A simulated API client, reacts to different game states, respects instructions to polling intervals, chooses game categories and answers to questions – the latter however not always correctly :)
After the test was modeled, I took care of provisioning the test systems, conducted and monitored each test execution. The grandcentrix team could therefore focus entirely on analyzing internal metrics and logs while a bot automatically reported the current test state (number of current users, current request rates, bandwidth, latency statistics, etc.) to a Slack chat room. After each test execution, all relevant metrics and charts were generated and thoroughly analyzed and interpreted by the team.
One of the bigger challenges was, that the Quizduell API is running on Google App Engine. A deeper look on the runtime environment was therefore not possible and we had to rely on the Google support team which was outstanding.
DOS protection: Since the test does not have 1,000,000 servers and IP addresses, the Google DOS protection was quite a problem for some time. Action from the Google support team was required to permanently unban the load generators from the DOS protection.
Google Magic: There are a lot of tuning knobs, which control the (scaling) behavior of App Engine and some of those are only changeable by Google itself. We had to carefully adjust those parameters to further optimize response times and stability of the API.
Network: There are strange things happening at higher request rates and the resulting network bandwidth. Strange effects on TCP connect timings, flow control, routing and other phenomena had to be traced back to their root causes, understood and, if possible, eliminated.
Conducting comprehensive load tests prior to relaunching the app-enabled show had shown that grandcentrix's product "Mobile Mass Response" (not available anymore) platform is capable of handling the expected load. The latest shows have proven that most of the initial performance problems could be resolved.
Here are a few numbers: While testing, my setup did a total of 1,213,583,187 requests in over 50 load test runs to the Quizduell system and a total of about 2.21 TB of data was moved. The error rate was at about 0.000000216% (1 error every 4,624,616 requests).
Last Friday (23.05.) we conducted another round of large tests resulting in another 800+ million requests and close to another TB of data transfer. We were able to identify and remove the remaining performance issues.
grandcentrix (@grandcentrix) May 24, 2014