Don’t turn Data Science into rocket science
Big Data, if we’re to believe the experts, will change the world. Our world anyway. But how exactly will it do this, and how should it all be deployed effectively? Well, certainly not by endlessly philosophising about it, insists Gerrit Vos.
Data Science, or Big Data as it’s often referred to, is still the talk-of-the–town, in a manner of speaking. But is it just a hype? Or is it here to stay? Data should be the most important aspect of an organisation. In fact it should even be on the balance sheet. Those in the know are comparing it to the impact the Internet had 20 years ago, a radical change, they claim. Above all, it’s also disruptive. The Dutch congress landscape is doing pretty well out of it, and good for them too. The question is, however, is it all really that new?
Let’s get back to the basics. Two things are relevant in systems: processes and data, neither of which can do without the other, like Adam and Eve. Processes can work well, but if nothing is recorded it’s just fleeting, nothing more than an experience. Like a night out with friends, but without your iPhone. The same can be said about data. All those photos are great, but if you don’t show them to people now and again you might as well not have taken them.
During the past few years countless methodologies for developing systems have been introduced. And while one methodology might have placed the emphasis on the process side (BPM), the other will have favoured the data side (EAR). And, of course, there were also combinations of the above (ISAC). But with the advent of agile methodologies, the discussion has been put on the backburner. With new methodologies the emphasis is more on collaboration, responsibilities and tools.
This is why it’s so amusing to see the re-emergence of the discussion about what’s important. The approach is now different: it’s become about the opportunities that new technical options offer. Whereas in the past it was impossible to store and search through hundreds of terabytes of unstructured data, now it’s merely a question of using the right tools, the right data scientist, a deep-enough pocket and hey presto, job done!
Organisations interested in using Data Science need to ask themselves three questions: where do we start; which tools do we need to use; and what should we do with our data warehouses?
WHERE DO WE START?
The answer to the first question is easy: if you don’t have a concrete problem with your information flow don’t start at all. In all other cases you should formulate the problem clearly and succinctly (but it must be a real problem!).
Answering the second question is trickier because the possibilities are legion. They can be broken down into three categories: BI tools (very many), artificial intelligence/machine learning tools (not so many) and real Data Science tools (just a few). The last of these categories is the most interesting. It combines data visualisation, intelligence, unprecedented storage capacity (structured and unstructured), and quick results. It might seem like a difficult choice, but knowing what you’re looking for makes it all a lot easier.
WHAT SHOULD WE DO WITH OUR DATA WAREHOUSES?
Which brings us to our third and final question: what should we do with our data warehouses? Some specialists are of the opinion that a gradual transition from data warehouses to data lakes is possible. Perhaps this has something to do with the huge investments that have been made (and who will have to tell the boss or the shareholders…). Others, myself included, question the wisdom of such a gradual transition. The principles of structured storage (files, databases) and unstructured storage (photos, emails, Excel files, dossiers) are completely different. And that’s not taking into account the possibilities of involving external sources to analyse the process. Don’t get me wrong here, data warehouses are still extremely useful and will continue to be so for many years. But we’ll eventually have to replace them with Data Science tools that can access all company and external data sources.
And that brings us back full-circle to the first question. What is important: data or processes? Data mining or process mining? I think it’s simple: sometimes it’s one and sometimes it’s the other, it depends on what you want. Smart application gives the best results, and if you’re really smart you’ll be sure to bear both in mind. Adam and Eve!