BIG Data: Is this about data at all?
Inspired by: Rafal Lukawiecki’s seminar about Business Analytics and Big Data, Microsoft Norway
Intro
Wikipedia defined BIG Data as a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. But this definition does not include the main reason why Big Data is actual nowadays and what was the purpose to re-invent this technology. Here is a visualization map for BIG Data and how is that large amount of data is generated:
Fig1. Big Data visualization by WIPRO
As you can see from the figure, Big Data aims to represent a large set of data with a single point (a value or a expression) so that make sense for us.
BIG DATA: how big is World data?
Big Data is one of the most famous words around the world describing a new technology that will handle the big amount of data that is generated every day for analytic purposes. But, anyway handling Big Data is not any problem because we are witnessing everyday that hardware capacity expands as data volume expands and also hardware is getting cheaper day by day. So the definition BIG is not at all the case of Big Data, so the hardware capacity can hold whatever BIG Data can be. The average data set of the whole World is calculated to be 1.5 GB and that is an average memory stick, even though an average RAM (in-memory) capacity.
Anyway, if the case is not capacity and size then what is it?
BIG Data: what about data?
Data is important part of BIG Data, but is this meaning of the concept behind the BIG Data? The answer is NO and to be correct BIG Data is just a meaningless buzzword created only for masses. Behind this name does not exist any concept of the Big Data. If you look at the data you can’t say nothing than is big or small, has 1, 2, 3 … n sources and is rapidly/slowly expanding etc…, but that is not what BIG Data is interested to solve at all. If you think the size is the matter, then you are wrong again, so Big Data is not about BIGness at all.
BIG Data is interested to answer the users and not developers, is not an optimizing tool but it is actually an answering machine.
BIG Data: The real case?!
The real deal in BIG Data is that BIG Data tries to generate a single answer (I like to call: the single truth) from a huge input of data. If the answer is the only output of BIG Data processing, so logically BIG Data is dependent on QUESTION. So, the real deal behind BIG Data is the question itself. If you are in a dilemma whether to choose or not BIG Data technology over traditional database technologies you should look not in the size and not in the data itself, but simply in your queries that you are going to use on that set of data. So, I agree totally with Rafal when he says that the reason existence of Big Data technologies are in the answer that we want to get from BIG Data.
Conclusion
BIG Data is just another buzz word without having to deal with the contest itself, but better when we know before we use it. Even thou, we can’t change the trends for this buzzword; at least we can support and use the technology as much as we can.