Change Focus from Cost to Value via Patient Outcomes!


I’m a TED fan, if you are not aware of you need to be! To whet your appetite invest under 13 minutes to watch this lecture, here.  Stefan Larsson, (not to be confused with Stieg although, some parallels may exist with the Millennium Series!), describes the reasoning behind the ICHOM initiative, (, who according to it’s site:
“The International Consortium for Health Outcomes Measurement (ICHOM) is a non-profit organization founded by three esteemed institutions with the purpose to transform health care systems worldwide by measuring and reporting patient outcomes in a standardized way.”

Here is the simple concept:

The problem is that we have been measuring the cost and using that as the metric how do you gain a measurement of the ‘outcomes’? That is the role of ICHOM, to measure the outcomes and create the benchmarks as well as find best practices. (If you didn’t watch the lecture he gives examples in hip replacement and prostate surgery.)

The key message is that wherever there has been a focus on improving patient outcomes the costs have dramatically dropped, not too much of a surprise there! I’m guessing that your immediate reaction is, ‘That is all good but who is going to do all this data collection work?’ Interesting is their answer is to use the data that should already exist in patient records as well as involve the patient themselves, reuse and distribution of workloads.

I only have one question, if we have been benchmarking in enterprises for decades, how come this is a new concept in healthcare?  There are numerous benchmarking organisations in various sectors who study a multitude of issues and collect data and publish the benchmarks for these aspect.

I think the answer is simple, in healthcare it’s not that easy!  In commercial organisations there are a relatively small set of quantitative ‘variables’, and in the most they revolve around PROFIT! This may include derivative measurements of cost/efficiency/productivity. However in healthcare the inputs are both numerous and not always quantitative, but today that is no longer a barrier.

Love it or hate it, the ‘Big Data’ revolution taking place has produced technologies and methodologies to compute with ‘subjective’ data! Now measuring patient outcomes and the factors that affect it can be mechanised and thus reasoned over to improve the ‘value’ within our healthcare system.

Now while Australia is participating in ICHOM’s work and I wonder how much impact their results will have on our system as a whole?


Can Technology Transform Healthcare?


How many times in your life do you have an opportunity to make a real difference? I think I have one of those!  EMC has asked me to look at how we can help transform healthcare with the technologies we create and way our customers innovate with them.

Why do I think it’s such a massive opportunity? Because all the elements are there, an urgent requirement, existing proven solutions and a direction that is compelling.

The Urgent Need
There are many forces driving a structural change, from individual’s demands, to the availability of skills, etc. But let me just paint the dire financial picture. Over the last decade in Australia, the growth in healthcare spending has been about double GDP growth.  If GDP is an indication of the tax base, then more and more of government spending is required for healthcare. Now add in the fact that the majority of health costs occur in the latter stages of life, these costs are going to grow faster in the future as the baby-boomers push the population age up! There is a problem today and the diagnosis is bad, this is an unsustainable situation, something has to change!

Existing proven Solutions
I believe that the answer is in transforming the system using technology, (no surprise to anyone who reads my blog.) Let me outline what I see as the major trends and how technology is vital to these:

–          From Hospital to Home.
My father was a radiologist and he used to say ‘Don’t go to hospital –more people die there than anywhere else!’ Although if you study the statistics, hospitals are becoming more dangerous places as more people contact new diseases and complications due to their stays. The point I want to make is that this method of care is like a mainframe, a time shared resource that you have to go to, but is this the optimal model? In computing terms we are moving to the second generation after this model due to better utilisation, more efficient and lower costs of computing. Surely healthcare infrastructure must transform away from this mainframe model as well?

–          From Consultation for Collaboration
When I grew up we had a family doctor, and he was almost part of the family. He knew my grandfather, (also a doctor), and knew me from the moment I was born until I left the country, I don’t think I saw another doctor! He knew everything about me, not just my medical history, but my lifestyle – (rugby injuries), my neighbourhood, (lived a couple of blocks away). Today, especially as you age, the number of clinicians a person consults with is growing, while there is little to no collaboration between these specialists.

More fascinating, (as I grew up in a radiological darkroom), is that although all x-rays are taken digitally, the patient invariably walks out with a film in their hand! Surely healthcare information must be accessible, shareable, and persistent?

The Future of Healthcare or “From Prognosis to Prevention”

Today healthcare diagnoses and treats, tomorrow we will analyse and avoid! The most promising outcome of technological advancement, as well as the most fascinating, is to truly understand how our bodies truly function and from this knowledge be able to avoid getting sick and to keep us strong all through our lifetime.  I was confused, I thought the practice of medicine was a science, however today for the most part it’s an art. But as we gain an understanding from a genetic and molecular level how the body works, the practice becomes a science, a science of ‘wellness’!

The only issue with this is the magnitude of the data we are dealing with, we are simply drowning in data. The massive amount of research data that is published on a daily basis is way beyond the practitioners capability to ingest and so diagnoses are made that are not based on full knowledge. For the individual, we are creating ever increasing amounts of data from wearable technologies to the tsunami of sensor data. Surely gaining meaningful use from all this data is the key to transforming the quality of the healthcare system?

I invite you to join me on this journey, to share your thoughts and let’s make a difference together!

EMC Strategy Update 2014 – Gateway to the Future.

ViPR signals the future of computing and now I understand this!

First a confession, sometimes it takes me a while to fully understand the impact of some technologies. I remember seeing the first iPod adverts and pondering why anyone would want to carry a hard disk drive around in their pocket!  Likewise when I first encountered ViPR, I thought neat way to manage storage… but it’s not going to change the world?  Like the iPod I have come to understand that this is industry changing. Big statement let me explain.

ViPR has two major components, a controller and data services.  The controller has had a lot of focus, as it was the most built out at release time. Fundamentally it provides virtualised storage and automated management across your whole environment. This gives you visibility into all your storage and a consistent way to manage it; resulting in lower costs and higher reliability. If you were sceptical you would say this is just the next generation of storage virtualisation, and it would be hard to argue that.

Now before highlighting the revolutionary power of the ViPR data services let’s make sure we are on the same page, with respect to the shift in IT technology that is currently underway. Analyst group IDC puts it succinctly as the movement from the 2nd platform to the 3rd platform. (Depicted below)


This is the movement to an infrastructure that is capable of servicing billions of users, with millions of apps, (driven by social, mobile and big data computing), will look very different to current infrastructures. Enter the Software Defined Datacentre, where we use software to manage and control these elements, (ViPR controller). More importantly, to gain the scale and elasticity required a new hardware construct is required!

One example is illustrated by EMC’s acquisition of ScaleIO.  ScaleIO presents a virtual storage array, that is built from the storage in the servers that participate. Surely this competes directly with EMC’s core storage business today? Yes maybe, but if I need 1000 engines driving a massively parallel workload, I can’t achieve that simply with the hardware resilient architecture of ‘traditional’ storage arrays. While scale out architectures like Isilon scale to the hundreds of nodes, ScaleIO grows to thousands to support the 3rd Platform requirements.

So re-think ViPR in this context, today I am firmly in the 2nd Platform and I implement ViPR to gain control, lower cost and improve availability. Then I get a request to support a 3rd Platform application, let’s say Hadoop. Do I rush out and purchase dozens of servers or how do I plug in the HDFS Data Service into ViPR and support them immediately out of my existing hardware infrastructure?

Here was my ah-ha moment… as I grow my 3rd platform services, I deliver these as data services against existing hardware today and move into specialised or commoditised hardware infrastructures, depending on other factors, but without disruption! Now ViPR becomes a mechanism for me to co-exist in these worlds and move between them as need be. (After all there is still a lot of mainframes/1st Platforms in use today!).

So if I’m right what would you expect to see from EMC? Expect more ‘Data Services’ which will look like virtual versions of the current ‘hardware’ products that exist today!

Xtrem Hotcakes!


WOW, the talk internally at EMC is that we might have become the market leader in all flash arrays last quarter!  What’s so impressive? The product went GA half way through the quarter!! What’s so Xtrem’ly hot about this product?

Consistency… it simply does what it advertises to do.. and it does it all the time!!  Let me explain, it’s like looking at the fuel consumption of cars! You read the specification and it show the absolute best consumption, which in reality is unachievable, (or at least for me).  Well it turns out that conservative EMC publishes the ‘on-road/real’ numbers, where the others play the theoretical specs.  Like a car the only way to work out the real numbers is to fill it up and run it for a while, (perhaps putting your foot down occasionally)!  That is what you should do when you test an all flash array:-
–          Fill it up:                 to get an indication of the true usable capacity
–          Run it for a while:   to see what happens to performance over time.
–          Put the foot down:    to see what happens!

On the last point, I’m guessing you are looking at all flash for performance, so you have to stress the box to see what happens. It’s not easy because you aren’t used to your servers being the bottleneck, so get some beefy servers and load it up!  I’m am warning you that you might be disappointed at what happens, on some arrays as it gets busy, services get shut down and it goes into a catastrophic spiral bleeding capacity and/or performance.

At SNIA we spent a great deal of time working out ways to test and classify the performance of Flash devices to enable you to compare. The reason this work had to be done is due to the way Flash as a media works, essentially its page oriented and needs some sort of garbage collection, it requires write levelling to improve durability, etc. If you are interested SNIA has all the information in the Solid State Storage Special interest Group, (here),

We’ve seen this behaviour before when I started at EMC I was responsible to introducing a new way to connect to storage called Storage Area Networks or SAN for short. (Yes I’ve been here for that long!) EMC as always tested out the full configuration and did the eLab job and published the real-world numbers. We got a shock when we saw the competitor’s numbers, a factor of about 100 higher!!  An interesting trick, they had worked out that the fibre channel chips had a small buffer in them, so their test wrote a small piece of data to the chip and then read it back again. Fantastic wasn’t it: absolutely valid as they were writing and reading from the array, while as the same time being absolutely useless as a measure of what would happen on your site.

I use to work with a guy, George Z some of you will know him,  who had a way of giving you an absolutely accurate answer, which was at the same time completely useless. Don’t get caught by this, it could be costly!

My Big Data Journey:- Getting serious or my pivotal moment!


In my last two posts I went through the basic tools and getting to data, after having had a bit of a play it was time to get serious!

As with a lot of things, they seem simple on the surface but as soon as you drill into them, they become more complex! Playing with a dataset of thousands of records, I noticed that the ‘runtime’ was extending, and I am not a patient person! So it was time to fix this. First issue was that my Python system was single threaded, so while my Intel i7 has 4 cores and 8 threads… very little was being used.  So I downloaded and installed iPython, (, a great interactive environment which also has a simple parallelisation mechanism. Simply allowing me to spin up a number of threads and make use of the extra CPU power. Great…

For a while my wait-time down to tolerable levels, but as my ‘maths’ got more complex and my datasets grew… things ground to a halt! (You can imagine when the machine started thrashing into virtual memory, it was way over my tolerance level!) So it was time to pull out the big guns!! Firstly- reading the data into a single data-structure was becoming an issue so the obvious answer was to divide and conquer, enter Hadoop.

Well I started using this great article, (here), by Brett Winterford, (leading IT journo and a great Muso.. YouTube if you are interested), who challenged the open guru Des Blanchfield to create a tiny-Hadoop. The article steps you through building a Hadoop environment in about an hour which uses about 500MB! (I had a little problem where the daemons did not start, after a few hours I worked out some of the config files were incorrect, otherwise pretty much as per the instructions.) Great introduction to Hadoop but not really enterprise ready… so I moved on.

You guessed it from my lousy pun in the title … all roads lead to, where you can find a downloadable version of both Pivotal Greenplum and Pivotal HD with HAWQ – the parallel version of standard SQL which is just SQL on steroids! Now let me warn you these are GB downloads but well worth the bandwidth. These are single node versions which are great to get a taste of these amazing tools. However, (and I hope this is not a secret), but a couple of weeks later I got sent a mail with a link to a cluster version!!

So that’s where I’m up to, my biggest learning surprised me a bit, was not about the technologies and methods at all, but about moving from ‘hacking/playing’ to ‘production’!  At the beginning of this exercise I thought the issues all resolved around the ultimate algorithms and absolute performance. However by the end I realised that it’s about the application of these tools and as always to operationalise these system the complete  lifecycle is more important than how many ‘rows’ I can process in a second!!

Lastly, get going the only thing to fear is fear itself!

My Big Data Journey – Where’s the Data? or Can Australia be innovative without open data?

In the last post I discussed the basics of getting practical Big Data experience. Now armed with some tools I needed some data to work on!

Well I mentioned last time and it has lots of data you can use, including data form the Titanic if you want some fun! They even teach you how to sample the dataset, in order to have ‘fresh’ data to test you methods/models/algorithms against. But I wanted to play with live data… so before we begin who is the guy in the picture?  Couple of hints: He ran one of the biggest of Big Data Projects ever. He got some guy elected.


 His name is Harper Reed and he was the CTO of the “Obama for America” campaign. (That was Obama’s second election!) Some of you might have seen him, as he was in Australia last year doing the speaking circuit thing. However, before the Obama campaign, Harper was the CTO of Threadless in Chicago, (a story is for another time), and living in Chicago he did a few interesting things. Like hack into their transit authority and make their bus data available! ( Which you can use to get the position data of the busses traveling around Chicago right now! And while you are ‘in Chicago’ you might want to see where the city spends its money, ( (Some ‘goofy’ payments highlighted on the site!)  And I even found their food health inspection data, so if you are ever visiting and want to see where not to eat, this is a goldmine of information.

 So in awe of the access to data in the US, I wondered if I could get some more interesting data closer to home, so I started with transport. I did find a couple of interesting things. Firstly, I take the train to work fairly often and use an app called TripView, which shows me the train timetable. I noticed last year that they started showing on-time running and delays in realtime, so I guessed the data feed must be out there somewhere.

 My search revealed that it seems like only a few organisations have been granted access to their API, but in this open and connected world is that right? If you think that data access to public data should be made open you can vote by signing this petition, (here), to get the links opened up!  The irony here is that in this new data driven and connected world is that there are always people out there who will find a way, and some people have. They have accessed the interface between the mobile app and their data sources… I lost interest but if you would like to go further have a look here. To me the interesting point is old vs. new thinking. People living in, what IDC calls Platform 2, (client/server), while the world is moving to the social, mobile and big data world of Platform 3!

 Moving on, (sorry), I did find that the old RTA, (now the new RMS), has granted access to data, such as travel times on the old F3, (now the M1), from Sydney to Newcastle. (Which goes to prove that you can change the name, but the traffic nightmare remains!) Their site offers data in all sorts of forms including an API which you can access for the cost of your e-mail address.

 There are other sources such as but as I’m past my self-imposed word limit, so we won’t go there. In summary, in Australia some data is out there in a useful form, but so much is not open, and without open data how can Australia become an innovative country in today’s world?

My Big Data Journey – The Basics!


It is All Out There!
Oh wow! There is so much out there to help you get going in Big Data… it’s almost a Big Data problem in itself.

My first step was to brush up on my programming/language skills, since it’s been years since I’ve tried to seriously write any code. I’d been playing around with Python for a little while so I naively Googled “Python for Big Data” and only got 29.4 million hits… so I took a course from to sharpen my Python skills, then followed a Python for Big Data tutorial given via YouTube and I was almost ready to go!

Lastly to get some practical experience I dropped by, signed up and went through a couple of their tutorials, with datasets and advice. Now I’m not a data scientist, but I do have taste of what they are up against in a practical sense.
You Don’t Need to Know The Math!
This will probably get me into trouble! However, I suggest that you don’t need to know how the algorithms work, you just need to know what they do! If you are not a Python person, then you need to know that there are a myriad of libraries available, (mostly for free), which provide a rich set of functions to perform data-analysis. So, the logic to run the algorithm is developed and is being improved by the community, all you need to do is understand how to use the code and what it does!!

For example let’s say you have two sets of numbers and you want to see if they are related, i.e if there is a correlation between them. So have a look at wiki, (here), and you discover there are several mathematical way to find the correlation depending on the type of relationship between the numbers, (linear, exponential, etc). Now if you want to perform a Pearsons’s coefficient calculation, there is Python library that gives you that, or you decide to use one of the Rank coefficient.. then like wise just a different call!

The tools are all there and everyone can use them, fairly simply… like woodwork! However, a skilled carpenter will produce a far superior product by selecting the correct tools and applying them with their past experience, superior knowledge and skills, as well as the insights they gain looking at the raw materials! Similarly that is what distinguishes me, (the hacker), from a Data Scientist!

Next stepping up from the basics…