By Clive Gold, CTO Marketing, EMC Australia and New Zealand
How often do you attend an event that has everyone engaged
and participating right up to and even after the end? This morning we had an
event in Canberra around the topic of Big Data. The room was packed, Dr Darrell
Williamson, (here), Deputy Director- CSIRO ICT Centre, was fascinating and I managed to not put everyone to sleep!
Maybe this was just good timing, with Senator Kate Lundy just announcing the “Open Technology Foundation”, (here), or maybe it is just the right time for us to think about how to use the data we have? Either way it was a great session.
Darrell covered a wide range of applications and issues, in which large amounts of data was the common theme. In some cases scary amounts of data, such as the Square Kilometre Array Pathfinder project, which will spew out 72TBytes of data each second .. which pales into insignificance when you consider this is a 1:100 model of the full size SKA, destined the push out 100 times as much data.
But Big Data is not just about these massive and specialised applications, Dr Williamson also touched on areas such as sentiment analysis, using twitter feeds to analyse the public’s feelings about the services they receive from government departments. He spoke about matching grain to the micro-climates around Australia, and many more thought-provoking uses of data!
So for EMC is this just a ‘needs lots of storage’ play, no. EMC has an very interesting stack of capability when it comes to this field. Firstly, it has a storage component, but not just more scale in traditional architectures, no new architectures custom designed for this large and/or lots of data requirement. Namely Isilon and Atmos.
Secondly, no point keeping all of this stuff if you can’t use it, so combine Greenplum, (for the more structured data), with Hadoop, (for the less structured data) creates a unique and very interesting analytical analysis capability.
Lastly, and the most important piece to answer the ‘so what’ question! So what do I do when I’ve worked it out? How do I make changes that enable me to leverage what I’ve found? The top layer of the EMC Big Data Stack is Documentum’s xCP. If you are not familiar with this technology, xCP is an accelerated composition platform, that allows you to effectively paint out a workflow and instantiate it. (i.e. compose an application with no coding required and run it). In this way, you can effectively implement a new application/workflow within days or weeks.
Not only does xCP allow you to make the changes to what people do, but through a connector back into Greenplum, you can build analytics into your standard workflows. For example, if you have someone who is issuing new credit cards, the workflow could do a realtime query on the available datasets to ensure it is not a fraudulent application. That could include a search through twitter feeds, social networking, demographics, etc. (For example one bank found that a person kept using addresses within one block of where they grew up, to obtain multiple credit cards under different names.)
So as I sit at Canberra airport waiting for a delayed flight, I’ve just been introduced to another big data application. A service that predicts if a flight will take off in time! How does it do this? It takes the historic flight data and correlates it with a number of other datasets: past 24 hours performance, weather conditions, current delays in the network, major events etc.. and adjusts the prediction appropriately; unfortunately I didn’t consult this before leaving for the airport! Hence the longer than normal post! Sorry.