Speak to the average IT person about Big Data today and you will get one of three answers:-
– We don’t have it, that’s just for the large on-line guys!
– Speak to the BI team, they are responsible for data analytics.
– We don’t know how to build the business case.
Ask for a definition of “Big Data” and that’s where the fun starts. Some will talk about the classic 3 V’s – velocity, variety and volume, but seldom do you hear anything that gives you a clear idea of neither the value nor the methodologies involved.
I think that this name “Big Data” is a misnomer! Firstly I’m not convinced that it’s about “Big” I have spoken to people who have gained immense value from understanding their small data. Secondly, I agree with Harper Reed who says it’s not about the Data .. it’s about the Answers! (Harper was the CTO for the Obama campaign and ran an immense Big Answers project, have a look here for more.)
To me, this thing called Big Data is about understanding what the data has to tell you!
How? Simply by bringing relevant data together to create context, then present that data in a human digestible form.
Bringing data together is where the magic starts, we bring all kinds of data together and the technologies allow us to link, map and/or match this disparate data together. This is very different to the traditional technologies. When people ask why we can’t use the existing technologies I say there are two problems, Relational Databases and Relational Databases.
The first problem with relational technologies is that they are relational! That means you need to analyse the data and structure it before you begin. The result of this is that you have pre-determined the scope and results that you will ever get out of the system. This is not valid in a situation where you don’t know what you will be looking for in six months time.
The second problem with relational technologies is the architecture was built for transaction processing workload. The “Big Data” workload does not fit a TPC-x model at all. In the end if you attempt to do this work and scale it, it will cost a fortune!
I know that might be too simple for some, but in my simple definition “Big Data” turning data into something that people can use. To expand on this idea we have to acknowledge that the human mind is better at processing certain types of stimuli, and is limited in the scope of what can be processed. In 1956 a study was done that showed that the human mind can respond to an average of five, plus or minus 2 stimuli at any one time. You know that you can stare at a screen full of numbers all day and not be able to detect the trend or an incorrect entry; however you can spot a dead pixel on a HD TV in a moment! Even relationships have a finite number, known as Dunbar’s number which is 150, above that number and you cannot retain a stable social relationship.
That is why the techniques in ‘Big Data’ allow us to model and visualise the data… to create info-graphics to allow us to grasp the meaning and to build on the knowledge we gain.
We are transitioning to a world which is real-time analytics enabled! (I will talk about this in the future when I bring this altogether.) Unlike today, where everyone has compartmentalised traditional and “Big Data”, much like on-line and batch processing, into two different computing disciplines. The future has to be one where these two worlds work hand in hand or perhaps future architectures incorporate both as a fundamental design. This is how the large internet companies operate today; imagine if you purchased a book from Amazon and a week later they sent you a recommendation for another book, would you open the e-mail?
So let’s just summarise this down to ‘Big Data’ = Understanding.
Next: Bringing it All together.