My Big Data Journey – The Basics!

occupations_carpenter

It is All Out There!
Oh wow! There is so much out there to help you get going in Big Data… it’s almost a Big Data problem in itself.

My first step was to brush up on my programming/language skills, since it’s been years since I’ve tried to seriously write any code. I’d been playing around with Python for a little while so I naively Googled “Python for Big Data” and only got 29.4 million hits… so I took a course from http://www.coursera.org to sharpen my Python skills, then followed a Python for Big Data tutorial given via YouTube and I was almost ready to go!

Lastly to get some practical experience I dropped by kaggle.com, signed up and went through a couple of their tutorials, with datasets and advice. Now I’m not a data scientist, but I do have taste of what they are up against in a practical sense.
You Don’t Need to Know The Math!
This will probably get me into trouble! However, I suggest that you don’t need to know how the algorithms work, you just need to know what they do! If you are not a Python person, then you need to know that there are a myriad of libraries available, (mostly for free), which provide a rich set of functions to perform data-analysis. So, the logic to run the algorithm is developed and is being improved by the community, all you need to do is understand how to use the code and what it does!!

For example let’s say you have two sets of numbers and you want to see if they are related, i.e if there is a correlation between them. So have a look at wiki, (here), and you discover there are several mathematical way to find the correlation depending on the type of relationship between the numbers, (linear, exponential, etc). Now if you want to perform a Pearsons’s coefficient calculation, there is Python library that gives you that, or you decide to use one of the Rank coefficient.. then like wise just a different call!

The tools are all there and everyone can use them, fairly simply… like woodwork! However, a skilled carpenter will produce a far superior product by selecting the correct tools and applying them with their past experience, superior knowledge and skills, as well as the insights they gain looking at the raw materials! Similarly that is what distinguishes me, (the hacker), from a Data Scientist!

Next stepping up from the basics…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s