By Clive Gold, CTO Marketing, EMC Australia and New Zealand
EMC Geenplum just released a new version of code, (here), which as a minor release does seem to me to be a major step to the promise of the Greenplum Unified Analytics Platform, and hence the future of big data analytics: Some thoughts on the features:
– Higher database performance and In-database analytics:
For years to come there will be an appetite for performance, to enable the operationalizing of ‘big data’ analysis. Finding ways to process more data in less time by boosting the base database or ‘embedding’ code into the Greenplum database, will continue. (In-database analytics is a bit like stored procedures in relational terms). The interesting part here is that these extensions are available to Greenplum customers, via a free download, from a library of useful routines. Which means that the data-scientist can concentrate on using these tools, rather than creating them!
– High-performance gNet for Hadoop:
High speed computing is all about removing bottlenecks, in general a system with 100% CPU utilisation is the goal. (As soon as the CPU is idle, execution time extend as there is a bottleneck somewhere else.) In a big-data analytics system, the movement of data is key to performance, and the heart of Greenplum is gNet. gNet pumps data around to keep the processing nodes fed. By extending Hadoop to use gNet as well, creates an environment where a query can be fired off at both of these technologies!
How cool would it be if there was a layer above this that could decide on the best analytics system to process a given query, and then move it to the right place? (Pure speculation, as I have no visibility into the roadmap.)
– Migration from other databases to Greenplum Database:
How obvious is this idea; converting data from one format to another is a highly parallel task. So here the system uses MapReduce to quickly exchange data between Hadoop and Greenplum. (Personally I’d like this for video rendering!) So how about as Intel gives us more cores and more can be done on the fly, the concept of ETL could become an ‘in-line’ component of using ‘data’, that happens to be in the wrong format, making all data sources available from all tools.
– Language and compatibility enhancements:
In the relational database world SQL is the language, however there is no equivalent in the analytics world! So as a transition strategy there is a need to bring the skills from the old world to the new, hence extending compatibility is vital to speeding this up.
– Efficient backup and recovery:
Without specifically commenting on these features, this is an indication of moving these technologies from more of a ‘science experiment’ into mainstream enterprise computing. I would argue that this is a major value EMC brings to the organisations it acquires. After all the world has relied on EMC for many years to look after its most critical data!
So although this was a ‘point release’ it seems to me the Greenplum Unified Analytics Platform, has taken a huge leap ahead of the market in brining big data analytics into the mainstream of enterprise computing.