Today the world changes, more rapidly. The face of Hadoop has changed forever as EMC Greenplum released Pivotal HD, a highly parallelised SQL frontend for Hadoop. (details here) Sounds geeky, so what does it mean? The skills shortage in Big Data is solved and you can use data to improve efficiency, drive profitability and become a predictive organisation. No more barriers, wow!
This is a significant announcement as it addresses the major obstacle to enterprises adopting ‘big data’, SKILLS. In the past you needed to have new skills to use Hadoop such as Hive, R, etc. In contrast in the ‘old world’ there is a unifying skill, called SQL. SQL skills are rife within organisations, even outside of IT, as many people have played with some sort of data manipulation tool. Lots of people have an Access database to track something or other, or have used utilities in programmes like Excel which are very SQL like. More importantly a plethora of tools, such as almost all BI tools, use SQL as the way to access data at the back end. Now all of this accumulated knowledge, all the tools, all the reports, all the analytics can be simply applied to a Hadoop dataset.
(If you are not familiar with Hadoop, you can think of it as an unstructured database. So a Hadoop dataset is any data that exists out there. No relationships required here!)
This announcement is even more significant as Pivotal HD produces a cost optimised parallel query. In layman’s terms its lightning fast with no human effort applied. In the past being able to use Hadoop was not sufficient, a good Data Scientist needed to be able to work out how to divide up the work to ensure all the available computing resources were humming … this minimised the time it took to obtain results.
Remembering that this is an iterative discovery cycle and the more times you can go around the cycle the better the results should be. Secondly, if we are going to operationalize analytics by enhancing people’s workflows, response time is critical! To embed real-time analytics into transaction systems means that complex multi-dimensional analytics needs to return results in seconds and that is what Pivotal HD does as shown in the graph above.
Now before you complain and tell me that is not a valid comparison to take a public domain piece of code and match it to a commercial offering. Obviously EMC adds value by making the product more robust and faster, so the graph below compares it to another commercial product, now from 9x to 69x is massively impressive!
I am excited, (as noted by all the punctuation) but this is a big deal. This is literally what the big data world has been waiting for. Just consider that you can plug in Pivotal HD and immediately make all your existing reports run many times faster. You can then expand the data that those reports are running against and get fine grain analysis done and then expand into new datasets and the world opens to up! The quest for a single customer view becomes an understanding of a single customer journey… now we are talking!
The new world is here and it’s a lot more exciting than making a purchase order execute faster!