Monthly Archives: February 2013

Mainstream Hadoop for you, no excuses anymore! Time to get on with Big Data

Today the world changes, more rapidly. The face of Hadoop has changed forever as EMC Greenplum released Pivotal HD, a highly parallelised SQL frontend for Hadoop.  (details here) Sounds geeky, so what does it mean?  The skills shortage in Big Data is solved and you can use data to improve efficiency, drive profitability and become a predictive organisation.  No more barriers, wow!

This is a significant announcement as it addresses the major obstacle to enterprises adopting ‘big data’, SKILLS. In the past you needed to have new skills to use Hadoop such as Hive, R, etc. In contrast in the ‘old world’ there is a unifying skill, called SQL. SQL skills are rife within organisations, even outside of IT, as many people have played with some sort of data manipulation tool. Lots of people have an Access database to track something or other, or have used utilities in programmes like Excel which are very SQL like.  More importantly a plethora of tools, such as almost all BI tools, use SQL as the way to access data at the back end. Now all of this accumulated knowledge, all the tools, all the reports, all the analytics can be simply applied to a Hadoop dataset.

(If you are not familiar with Hadoop, you can think of it as an unstructured database. So a Hadoop dataset is any data that exists out there. No relationships required here!)

This announcement is even more significant as Pivotal HD produces a cost optimised parallel query. In layman’s terms its lightning fast with no human effort applied.  In the past being able to use Hadoop was not sufficient, a good Data Scientist needed to be able to work out how to divide up the work to ensure all the available computing resources were humming … this minimised the time it took to obtain results.


Remembering that this is an iterative discovery cycle and the more times you can go around the cycle the better the results should be. Secondly, if we are going to operationalize analytics by enhancing people’s workflows, response time is critical! To embed real-time analytics into transaction systems means that complex multi-dimensional analytics needs to return results in seconds and that is what Pivotal HD does as shown in the graph above.

Now before you complain and tell me that is not a valid comparison to take a public domain piece of code and match it to a commercial offering. Obviously EMC adds value by making the product more robust and faster, so the graph below compares it to another commercial product, now from 9x to 69x is massively impressive!


I am excited, (as noted by all the punctuation) but this is a big deal. This is literally what the big data world has been waiting for. Just consider that you can plug in Pivotal HD and immediately make all your existing reports run many times faster. You can then expand the data that those reports are running against and get fine grain analysis done and then expand into new datasets and the world opens to up! The quest for a single customer view becomes an understanding of a single customer journey… now we are talking!

The new world is here and it’s a lot more exciting than making a purchase order execute faster!


What’s the real impact of consumption based pricing.


I’ve spent the last few days in Maroochydore at the annual IT-Journo gabfest put on my MediaConnect. It’s been years since I’ve attended this forum and picked up a couple of interesting ideas..

Firstly that there are so many start-up Australian companies doing some fascinating things. Zero was there which was a bit weird as I’d just seen them in Melbourne as they were next door to an EMC event in Crown. The reason I remembered them is their tag line is “Beautiful accounting software” which to me an oxymoron like ‘military intelligence’. They provide a cloud based system, but what is interesting is their approach to partnering… they publish an API to allow others to add services onto their core platform. This has attracted a raft of other organisations to differentiate their service by building an add-on.

Secondly, and a little more subtle, was what a guy who sells ‘subscription’ technology. (The stuff people who set up a subscription based business use to transact.) What he said was, “The move to consumption based pricing is a move from product to relationship.” I know I’m slow but this was an aha moment for me!! I suddenly realised that this trend is not just a different way to pay for something, (which is the way that a great deal of IT companies have viewed it), but the beginning of a new business model completely. Now we all understand the change to consumption moves the focus from features to outcome, however how does this play our in the future.

So in my mind the question is that if I subscribe to a service that meets my needs today, and my needs evolve over time, should I expect my current supplier to evolve their service? I’m guessing that the successful ones will and they will be the winners. Let’s face it perhaps this is the biggest issue with traditional outsource arrangements where the requirements and the contract slowly drifted apart… leaving both parties unhappy!

Friday rant: Epic Fail by TNT leads me to Big Data!

A bit of a Friday rant of when technology, people and process don’t work together.

My sad story starts on Tuesday I ordered a router from Apple, which I need for an upcoming seminar. Apple processes the order and gets TnT to dispatch the item very efficiently, it is dispatched overnight. Error 1… no notification so no-one is home!


Wednesday night I get home to find the ‘Sorry we missed you’ note including phone number and a web address to arrange re-delivery. Being the geek I am, (and the fact it is after hours), I duly logon and arrange the delivery for Friday, so I can work from home and be there to sign (a requirement).

Error 2 and 3… I get to work on Thursday and receive an SMS that the package is on its way! Why two errors?  Well first they got my on-line request, as they now have my mobile number but then ignored the request to deliver on Friday.

I then arrange for a family member to stay at home to receive the package, yes you guessed it.. no delivery. Error 4 their website shows that they attempted delivery and left a new ‘Sorry we missed you’ card… now neither did anyone knock on the door nor is there a note!! (Something smells wrong here!)

Error 5, this morning I check their tracking web-site and the item is not on the truck. So I call up to find out what is going on.. and my details are taken down and I’m told that ‘Customer Services’ will get  back to me immediately… Two hours later I call again…  Error 6, I am told they are not coming to my area again, so I’ll have to be there on Monday to receive the package!!

Now I know it’s dangerous to insult the waiter before the food has been served! However, after inconveniencing me for two days, there was no qualms about telling me to stay at home for a third. I queried all the errors above and no explanation was given, in fact no interest in ‘customer service’ what so ever! Fascinating was the attitude that this is all my problem and nothing to do with them… how wrong are they?

Now, obviously if I ever need a courier I am not going to choose TNT. However this loss of revenue is of consequence to them given the number of times in my life I will use their services.  But what about from a cost point of view! In the end they will have handled the package 4 times and delivered it, an alleged, 3 times!  Now either I’m totally confused about the margins in the logistics market, but I can’t see how they can turn a profit on this. The time I had to wait for the phone to be answered and the number of complaints I found on the web would indicate this is not an isolated event.

(I am in not saying that TNT does not run a good operation. With customers like Apple, you can imagine the massive volume of packages, and at this scale a low % of errors leads to a lot of on-line noise!)

My question is, what are they doing to move towards zero incidents? My impression from my interaction is nothing! But I have to believe that the multiple handling and error processing processes they have must be costing them millions, which if resolved could drop straight to the bottom line!

Just consider if they collected and collated all of the information they have about these incidents. Not easy as its different types of data and it’s all over their organisation. How hard would it be to pick up inefficiencies, like reasons for ‘not at home’ deliveries, the reason why customers report no delivery was attempted and the system says it was, the marketing program to promote ‘deliver to work’.. oh so many ideas…. Just another big data opportunity, internal efficiency!

A Battle-ground in the SDDC War!


The current battlefield in the cloud technology war seems to be the orchestration layer! For the EMC family VMware purchased DynamicOps mid last year and EMC sneaked in the purchase of iWave late in 2012.
I met with the head of the leading EMC partner in this area, Infront who see a great deal of customers battling to make the transition into IT as a Service. I get back to the office and knock into a pack of consultants out here from our Cloud Services consulting group on a 10 week assignment helping a major outsourcer develop this layer.

What’s the issue? Well it seems far harder than first expected to create the fully automated, self-service IT infrastructure. Some of you will say that is rubbish there are catalogue and self-service products out there, but that just tells me you have never run an IT operation. To deliver the end-to-end service reliably and to meet the SLA’s for each application consistently.. is not just a layer which enables click and provision! Never-mind how you move your organisation into this model, people skills, organisational structures, financial transparency, costing and billing… to name a few other little issues.

What’s the answer.. I have no magic bullet idea here! Sorry, probably just hard yakka for the software developers, the education specialist and a boat load of money for the consultants!

However, in my cynical way I wonder if the talk about ‘The Software Defined Datacentre’ is allied to this challenge. After all if we have logical control over hardware resources the same as we have over software resources the end-to-end orchestration of the environment does become more achievable. Just remember that the conductor is there to ensure that all musicians do what they need to as well as when they need to do it, however the magic only happens when everything works together in exactly the right way!

The IT Kick-off is dead what a pity!


In the early days of the IT industry it was customary to get the sales team together at the beginning of the year to gee them up and get them going for the year.

If you haven’t worked for a vendor before it’s a bit like a yearlong Groundhog Day. Each year you get a new number, you look after your customers and find new customers in-order to make that number. When you make the number you get a congratulations, a slap on the back, a shake of your hand, as the other hand receives the new number!! Having sales meet their number is essential, as it enables the developer to invest in innovative new technologies, which helps customers create value for their ‘customers’ … which is the virtuous cycle!

So the kick-off is a combination of the congratulations and kicking off the year! (Arm the sales force with all the new goodies and new messages and send them on their way) In the old days these were ‘big’ events! (You can imagine the personalities involved in an entire sales force, combined with the end of the year elation and fuelled by cold beverages!)

However, in this rapidly consolidating industry the major vendors are now large and the accountants have taken over! Now I understand the sticker shock when they look at the cost of doing this, however what I don’t understand is that the larger the event, effectively the lower the per person cost. So why when a company was small was it worth investing more in the people and when it’s large?

More of a concern is that if this infusion of the organisation’s culture and message led to the organisations success, what is going to happen now?

Storage is not Snorage This Year!


An Australian Journalist who is well known and one of the doyens of IT News coined the term “Storage is Snorage” about 10 years ago! For the most part he has been right! EMC and the industry have enhanced the hardware by ‘riding the price/performance’ curves of the underlying components… and disks have grown 1000x and processors 10 000x faster and RAM cheaper, However until recently, the basic architecture has not really changed!!

But that has all changed and two new architectures reach prime-time this year!

The first is scale-out, yes I know Isilon has been in the market for over 5 years, but in more niche application areas. Now two trends are merging, mainstream computing environments are experiencing massive growth in unstructured data and the traditional architectures are creaking, and secondly Isilon has ‘Enterprise’ features.

A quick word of warning when you look around, the value is in the architecture not the fact that there is a single file system! Why I say this is that whenever there is a major advance in technology, you get the ‘horseless carriages’! People who take the old technology and substitute some-part of it and think it’s all new.. (Or less kindly; you can put lipstick on a pig, but it’s still a pig!) The reason I put this point in is that ‘traditional’ storage with its RAID groups and LUN size limits, (or aggregates), is the source of the management nightmare when you scale to the PByte level. So putting a wrapper or layer above this does not detract from the management, etc, overheads. So to do Scale-Out, you need to design from the ground up.

Talking about ground up design, it brings me to the second exciting architecture this year, the All-Flash array. Once again all storage designed before this had one design consideration, ‘locality matters!” Because of the mechanical drives the position of the head is a major determinant of performance and through put. (Throw random requests at a disk drive and it will perform like a stuck pig, order them and it screams!)

Now if you start from scratch and design around a storage medium that has no locality, (no penalty for writing or reading form anywhere). Major change in design. Secondly, RAID, well that is an absurd notion, there is no ‘DISK’ involved, just an address space!

“The Tale of Two Transfers” or “The New WWW!”

EMC released an enterprise version of its acquisition of Syncplicity recently, (here), which struck home recently with these two tales…

Transfer One.

I got to work early to write the blog post and someone was having trouble getting a video file to a colleague. (Our internal e-mail limit is 20MB.) Being the go to guy for any support in our office I got stuck in… found various shared drives to try and get the file as close to the recipient as we could. Emailed the link, (with the standard “space” in the address issues,) and eventually we got the file there.

Transfer Two.

My wife has gone back to full time work, and we thought it would be a good time to help the 18 and 21 year old become more domesticated. The plan is that every one of us will prepare at least one meal a week, (share the load and acquire an essential life skill at the same time!) However only one problem, it does not make sense for four people to go out shopping, so my son found a shopping list app. The real kicker with the app is that it keeps the same list current on all mobile devices! So instead of running around and trying to tell everyone what we need, we simply add it to the list.

What’s the point? Well two different approaches to essentially the same problem, information sharing! The issue here is that approach one is so last year! The notion on moving data around as files is not relevant in today’s world for two major reasons… synchronisation and size!

Most professionals in Australia now have two mobile devices plus their laptop/desktop! Keeping productive is the new WWW, (What, Where, & When I want it.) which results in accessing your information from whatever device is at hand at the time. Synchronisation is a major issue in all of our lives! Cloud-ifying your information is one way to solve this problem, however you have to choose sides, Google, Microsoft of Apple.. and it’s extremely difficult if not impossible at this stage to make them all play nicely together.

Secondly there is the problem of sharing with someone else! Now enter a myriad of other services, most commonly used (and blocked by corporate networks), is Dropbox! However the Syncplicity solution brings a new flavour into this mix, where corporates can house their data in-house and provide their secure and protected ‘cloud’ service from within their datacentre, if they want!

So roll on the standardisation on ‘cloud’ when I can have one diary between my Windows Laptop, my iPhone and my Android Tablet… do I hear someone say dream on!!