Blogging about data is popular; blogging about databases is not. Unless, that is, you’re a techie or supertechie. Gerrit Vos is neither, but nevertheless he’d like to throw some light on the subject. Particularly now, given that data plays such an indispensable role in our work and lives.
It all sounds so familiar: the more serious the organisation, the more serious it needs to take the storage of its data. And for this reason, we built expensive, complex, database management systems (DBMS). During the past 25 years there has been one standard: the relational database (RDBMS). Oracle, IBM (DB2) and Microsoft (SQL Server) dominated the database landscape. In the world of finance too.
A huge ask
In the case of a RDBMS, you have to carefully consider beforehand which entities you will use and what data they must contain. Retrospective changes are possible, but they can prove difficult. It’s also important that not everyone should be able to simply add or delete data. Furthermore, all ACID features must be carried out properly, several people must be able to use the RDBMS simultaneously and its performance must also be good. All in all, a huge ask.
The use of RDBMS is perfectly normal in banks, insurers and pension funds. There are even organisations among these that think it’s the only option. But I have good news for them: there are more options, and there have been for quite some time. Sometime around 2000, a movement that challenged the complexity and cost of the RDBMS started gaining momentum. Given the then lack of alternatives, they started developing them. The movement won more followers, became better-organised and started working on alternative ways of storing data and making it searchable. Before long there was talk of a new industry standard: NoSQL. Various interpretations have been provided, but I think the best definition is: non-relational.
Dynamo, Bigtable and Cassandra
Organisations such as Amazon, Facebook, and Google made substantial contributions to the development of NoSQL. Amazon developed Dynamo, for example, Google invested its energy in Bigtable and Facebook founded Cassandra. Performance issues and their distribution environments meant they had to go in search of alternatives. Universities also helped with the development of NoSQL, such as the Massachusetts Institute of Technology with its self-developed Hstore.
During the past seven years these NoSQL databases have really taken off. You can make all kinds of cross-sections of them, by looking at memory usage, for example, or storage technology, hashing and security techniques, distribution, consistency and applicability. It’s easiest to study the basic principle: storage technology. Are you dealing with Key-Value, Document stores or Column-Oriented stores?
- A Key-Value store (such as Dynamo, Voldemort, Tokyo Cabinet, Redis and Scalaris) have a simple model: values can be stored and retrieved using keys. The technology is scalable, but relatively inconsistent, which makes is less suitable for applications in which analysis plays the main role.
- Document stores (such as CouchDB (Apache) and MongoDB) use documents as a storage technique. They can process more complex and meaningful structures without imposing restrictions. Moreover, all of these standards can be used with one another, rendering migrations unnecessary.
- In Column-Oriented stores (such as Sybase IQ, Vertica, Bigtable, Hstore (MIT), Cassandra) columns are used to store data, as opposed to rows. The principle is straightforward but the technical implications are quite significant, mainly because of high demands for consistency, distribution and performance.
The policy as a document
What exactly does all this prove? Well, to start with, that developments in data storage and usage are taking place very quickly. While it’s impossible to predict who the dominant NoSQL players will be, it has become abundantly clear that the idea that we need just one RDBMS for all systems is out-dated. For reasons of cost, flexibility, implementation speed and maintenance, an NoSQL database can be a better alternative. Ask yourself this: is it really such an outlandish idea to save a policy as a document? Or regard a savings account as a collection of transactions?
Now while I don’t believe that companies will simply abandon their existing systems, I do think it’s important to keep a keen eye on developments. Had Google, Facebook and Amazon not travelled such creative new paths, the online world would currently look very different. Probably about the same as it did some 15 years ago. And is that really what we want?