| Oct 12, 2015

3 MIN READ

Written by Sachin Dabir
position of great heights

Big Data Paradigm

“…. But what is Big Data? “, “What technologies are called Big Data?”, “Where do I start off?”, “How do I get budget for Big Data project?”

These are the commonly heard questions when you meet enterprises and drop a word ‘Big Data’. Many times it is greeted with sarcastic smile or looked down upon as a sales pitch. Unfortunately, there is an element of truth behind that sarcasm and disdain for sales pitch.

Big Data has become a buzz word in the industry which loves this kind of hyper activity. From being an approach to address need of processing new data on a continuous basis, it has become an industry into itself. Every company wants to tag along this bandwagon and literally tags (or hashtags) its offering to Big Data. From hardware manufacturers to website developers, programmer to content writers everyone wants to position their offering for ‘Big Data industry’. In the process the focus on real solutions, needs and objectives are getting lost. For the customers who want to have real conversation, who want to understand real solutions this kind of noise is distracting and hence sometimes you get above mentioned reactions.

I always like to bring conversation back to the basics.

Key question is, why is everyone talking about Big Data? Industry has always been dealing with data, large data and RDBMS has been around for over 30 years. Then what has changed and how it is impacting the business is the important question. The often quoted reasons are Vs – volume, velocity and variety of data. And it is true. As per IBM’s study in 2013, 90% of the data that existed was generated in just two years preceding 2013. Take a breath and understand the enormity of this number. In over 40 years of modern computing history, in just 2 years such an enormous amount of data is generated !

Quite often people who are supposed to look at the data, don’t even know that they have access to large amount of data. We witnessed this situation first hand when one customer approached us to fix problem with the existing database system. Apart from solving the problem, when we asked what were they doing with the logs and other data that their equipments were generating. They said that that data was not used by any application and hence it was being purged regularly. When we showed that they could get lot of business insight from data (which they were partially aware) they mentioned second problem, how to store and analyze that data. Which brings us to second part. But staying on the first part of the challenge – that there is enormous amount of data that is being generated and it is not being put to use.

The case of this customer is not isolated. Every type of business is witnessing this data explosion. Some are aware of it some are not. Some don’t know how to store it and some don’t know how to leverage it (and at what cost). The data is coming from multiple sources. B2C kind of businesses get data from multiple customer interaction channels – websites, online shopping, social media interactions, marketing promotions, call centre interaction, in-store video analytics etc etc. And this data is not coming in one single form, it comes in text, logs, videos, audio files. Even for B2B kind of businesses there is a massive data availability challenge which has not been addressed by existing systems.

Apart from the ‘volume’ and ‘variety’ the ‘velocity’ of data generation is a big challenge. It is not enough to just store the data, clean it, analyze it and produce a report from it. It is important how fast are you able to produce actionable output from data since the time it hits your system. If you want to run in-store promotions or want to offer location specific, time bound offers you got to act upon the data within a given time window – at times within seconds or even micro seconds. This requires different system than what your existing off-line reporting systems are capable of.

To summarize, we have a situation today that is significantly different than what it was just couple of years back. We have massive ‘volume’ of data, it comes in ‘variety’ of types and ‘velocity’ of this data is not seen before. Hence our approach to capture, store, analyze and produce actionable output in a time bound manner has to be different than what we have. We need to look at technologies that would address this entire paradigm shift. And hence we are talking of ‘Big Data Paradigm’

Interesting times are ahead.


Go to Top