StormForger's load testing engine has been based on Tsung since we started our SaaS offering in 2014. In 2019 we began to design a new engine which now has been rolled out to all of our customers. Today I'd like to shed some light where we are coming from and what our reasoning is for moving forward without Tsung.
I'd like to start with how we came to use Tsung1 as our basis to build what you now know as StormForger.
I have been a long-time user of Tsung since around 2009, almost 5 years before StormForger was founded. It started as the foundation of an evaluation framework for my master thesis on "Design Patterns for Scalable, Service-oriented Webarchitectures" and I continued to use it in my time as a freelance consultant.2
Tsung was always fascinating to me mainly because of its use of Erlang and the design approach. It is relatively old (first commit 20013) and often misunderstood in terms of its strengths and weaknesses. Articles trying to compare Tsung to other performance testing tools often fall short because of a lack of understanding how Tsung is meant to be used.4
Tsung is incredibly efficient and scales easily to hundreds of thousands of concurrently active clients on a moderately sized distributed load generator cluster. Because of Erlang's, or better BEAM's, soft-real time properties it works beautifully for measuring duration which allowed us to run very large scale test early on.5
But Tsung is not without its flaws. One major shortcoming of Tsung in context of using it in an automated fashion is its design to be used by an operator: Tsung has been designed to be used by a human, sitting in front of a machine (or being SSH'd into one), running a bunch of commands, looking at logs etc. Tsung does not provide nice, machine-readable error messages, some statistics are in a strange format and meant to be processed by a Perl script generating gnuplot programs6. Over time we have found several improvements and workarounds for the fact that Tsung was never meant to be automated. Changing this fundamentally is very hard though.
Another source of problems for us was that Tsung is a multi-protocol7 load testing tool, almost more like a framework for network-based performance testing. StormForger on the other hand has always been focused on HTTP. This made it very hard sometimes to change things in Tsung because of its generic approach to many aspects. For example, supporting HTTP/2 would be a massive undertaking8 which we considered multiple times. In the end it's a good example that you cannot have it both ways: Highly generic and still strong in specialised scenarios.
We came to the conclusion that we need a purpose build engine, that is less generic and built specifically to our needs.
The goal with our new engine was to be specialised to what we need: focused on HTTP and most importantly built to be automated and integrated from the beginning.
We have a lot of interesting features in mind to kick off the next generation of our engine, but our first goal was to have new foundation to build upon. This was also important to us, because we have to keep in mind the many thousand test case definitions that our customers have written over the years. Since many of them run automatically we need to minimize the migration efforts our customers have to do.
We quickly came to the conclusion that we should aim for the following:
- Be as close as possible to a drop-in-replacement for the current feature set.
- Support new need-to-have features right away if this is not in direct conflict with point 1.
- Design and build for automation from the beginning, including live profiling in production, better monitoring and observability and simpler operations in general.
- Layout a foundation for new features with possible breaking changes to existing test cases.
So far we are quite pleased with the outcome: Only some undocumented, special features we built for our customers needed some adjustment. Without directly breaking compatibility we were still able to bring some new features and better behaviour right from the beginning to all our customers. For example this includes better debuggability for our customers, support for HTTP/2 and TLS1.3 and even faster "time to first request" when a test starts.
We also greatly improved our internal development and testing processes which already allowed us on several occasions to deliver new features rapidly.
The Migration Path
Before we actively started over the migration of StormForger customers to our new engine, we made a lot of internal experiments and sanity checks in addition to our usual automated test suite. Since our test case DSL is a declarative description, we could also challenge our new engine with many thousands of existing test cases to see if we missed something – all without actually running tests against our customers's infrastructure.
The next step was migrating the first customers to our new engine. We started with this a few months ago by picking customers with whom we have shared Slack channels as part of their extended support package and offered them to give our new engine a test. We only switched over a couple of customers and observed their test runs very closely. Since many run their tests automated on a daily basis, we were able to gather lots of data, address smaller issues right away and pushed out an updated version rapidly.
Some weeks ago, we defaulted all new customers to our new engine while only a small fraction remained pinned to our old engine based on Tsung. This helped customers with a larger test code base and lots of active DevOps teams to get the required evaluation done and to ensure that everything is working as expected.
The Road Ahead
Our new engine solved some long standing design problem we had right away and enabled some features right from the start: Mainly HTTP/2 and multiple open connections per simulated client - which was simply not possible with Tsung before. We also improved a lot of our internal tooling and significantly improved the tools we can offer our customers to create and debug their test cases.
With our new engine and a new foundation in place, we are already working on new features related to capturing more metrics, better and deeper integration into our customers's development and QA processes as well as making our test case DSL even more expressive.
We will keep supporting our beloved, legacy engine for a few more weeks for the unlikely case that customers encounter an issue. After that we can tackle the next features we have in mind that are a breaking change to our old engine.
It is not without mixed feelings that we have to say: Farewell, dear Tsung. Thank you for many billion requests and years of great service. We'll miss you. 🤧.
Although we never denied that we use Tsung, we did not communicate it publicly. Some folks are surprisingly good at looking for details like a commenter on Hacker News in 2014 shows: He pointed out, that parts of our logs looked like an Erlang process identifier and our keywords in our DSL have a lot of familiar terms to Tsung users. Crazy! ↩
I won't call names, but usually comparisons are biased. Tsung seems to be the misunderstood underdog in most articles I've read so far and I can't remember a writeup where Tsung actually made it as "the winner" 🤔 ↩
Supported protocols are HTTP, WebSockets, WebDAV, SOAP, PostgreSQL, MySQL, LDAP, MQTT, AMQP and Jabber/XMPP ↩
I don't want to go into all the details why HTTP/2 would be quite hard to implement. Tsung comes from a time where there was a little 3rd party ecosystem available for Erlang. Everything is hand crafted and implemented in Tsung directly, down to the HTTP client and connection handling. While certainly possible it was not feasible with our resources. ↩