In the last post I discussed the basics of getting practical Big Data experience. Now armed with some tools I needed some data to work on!
Well I mentioned Kaggle.com last time and it has lots of data you can use, including data form the Titanic if you want some fun! They even teach you how to sample the dataset, in order to have ‘fresh’ data to test you methods/models/algorithms against. But I wanted to play with live data… so before we begin who is the guy in the picture? Couple of hints: He ran one of the biggest of Big Data Projects ever. He got some guy elected.
His name is Harper Reed and he was the CTO of the “Obama for America” campaign. (That was Obama’s second election!) Some of you might have seen him, as he was in Australia last year doing the speaking circuit thing. However, before the Obama campaign, Harper was the CTO of Threadless in Chicago, (a story is for another time), and living in Chicago he did a few interesting things. Like hack into their transit authority and make their bus data available! (http://ctabustracker.com). Which you can use to get the position data of the busses traveling around Chicago right now! And while you are ‘in Chicago’ you might want to see where the city spends its money, (http://www.citypayments.org). (Some ‘goofy’ payments highlighted on the site!) And I even found their food health inspection data, so if you are ever visiting and want to see where not to eat, this is a goldmine of information.
So in awe of the access to data in the US, I wondered if I could get some more interesting data closer to home, so I started with transport. I did find a couple of interesting things. Firstly, I take the train to work fairly often and use an app called TripView, which shows me the train timetable. I noticed last year that they started showing on-time running and delays in realtime, so I guessed the data feed must be out there somewhere.
My search revealed that it seems like only a few organisations have been granted access to their API, but in this open and connected world is that right? If you think that data access to public data should be made open you can vote by signing this petition, (here), to get the links opened up! The irony here is that in this new data driven and connected world is that there are always people out there who will find a way, and some people have. They have accessed the interface between the mobile app and their data sources… I lost interest but if you would like to go further have a look here. To me the interesting point is old vs. new thinking. People living in, what IDC calls Platform 2, (client/server), while the world is moving to the social, mobile and big data world of Platform 3!
Moving on, (sorry), I did find that the old RTA, (now the new RMS), has granted access to data, such as travel times on the old F3, (now the M1), from Sydney to Newcastle. (Which goes to prove that you can change the name, but the traffic nightmare remains!) Their site livetraffic.rta.nsw.gov.au offers data in all sorts of forms including an API which you can access for the cost of your e-mail address.
There are other sources such as data.gov.au.. but as I’m past my self-imposed word limit, so we won’t go there. In summary, in Australia some data is out there in a useful form, but so much is not open, and without open data how can Australia become an innovative country in today’s world?