At NoSQL Matters 2014 conference in Barcelona I was privileged to give a talk on how we at StormForger go about screening and selecting NoSQL and other technologies; with which to build our products. Although NoSQL is not necessarily the primary focus here at StormForger, we still use a number of these exciting systems and I would like to share our thoughts.
Startups and NoSQL
Startups are agile & open minded, they often consist of small teams and have to be kind of pragmatic for that reason. I won't go into defining NoSQL, I'd like to refer to the Wikipedia definition for that. Important is that NoSQL is (or at least should be) selected for two reasons: ease of use or you have a very special problem to solve.
I'd argue that most of the time, you don't have a uber-special problem to solve. You are probably not dealing with big data (TB+ of data, or many billions of items), you don't have to guarantee like five nines availability and don't have to scale right away to millions of requests per second…
If you want to be lean and agile, you should focus on the ease of use aspects (development & operation). NoSQL is about modeling data in other means than tabular relations, so maybe your data model fits to a document store, or column store, or key-value store. If that is the case, it might be "easier" to just use a NoSQL database and not try to fiddle around with mapping your data structures to PostgreSQL or MySQL.
But ease of use does not stop with how you model your data. Keep in mind that you still need integration into your environments, languages and frameworks, you need good tooling for your operation needs, general maturity and reliability is also important and last but not least, you need some kind of support — community or commercial.
Polyglot Persistence at StormForger
We at StormForger — besides being a startup — have different kinds of needs for data persistence. Following the polyglot persistence approach, we have found different tools for each job.
Here are some examples for NoSQL usage at StormForger. We use…
- InfluxDB for time series data,
- Redis for all our caching needs, and
- Elasticsearch for log aggregation & analysis.
There is no one size fits all solution for us — and most probably not for you either!
Being lean and pragmatic
We have one more use case for data which is not really a good fit for tabular relations: Our test case definitions consist of highly structured and complex data structures.
Although we have begun to evaluate solutions for this need as well, we saw that we can be pragmatic about that for now. Our current solution? Serialize the JSON and just stuff it into MySQL. Although we might be in need for a more sophisticated solution in the near future, we don't need it right now to build our MVP and test our first assumptions about what the customer actually wants.
Especially with limited resources that you have in a startup context it's very important to take a step back every time you encounter a new interesting technology. I myself often also fall for the fancy new stuff, but as a startup we want to get one thing right: be lean and test if the product we envision is actually what the customer wants.
It can be perfectly fine not to have the optimal technical solution upfront. Maybe it's fine to serialize your structured data like we do, or maybe you can just use PostgreSQL's hstore or the upcoming jsonb fields.
If your product is indeed fundamentally based on dealing with e.g. highly structured data directly, take a look at document stores with powerful query capabilities, like e.g. ArangoDB. Or you have to crunch terabytes of data with tools from the Hadoop eco system. In all other cases: Be agile, think lean and focus on validating your ideas!
Besides embedded here, you can find the slides on Speaker Deck.