What’s the problem with storing BIG DATA? (Big Data, Part 2)

In part one, here, I teased that the telco could not get the data they needed and they could not analyse it.

Let’s drill down into the first issue, if the telco really wants to ‘know’ me as a customer they would need to pull all my call data records together, (perhaps the call data records of the people I called) and internet usage (URLs visited)! Now doing this for me and then the rest of their customers individually, would not be practical. They would need this data for all their customers. That is a lot of data, 100’s of Terabytes or maybe even Petabyte(s) of data, and keeping this in one place is hard.

Data storage is not like a big box you just throw stuff into; you need a structure to make it work. The structure is created through software that creates RAID groups, file systems and volume managers, which result in limitations at scale. The size is limited by software that is created with a fixed data word size. You do the math and the word size limits the amount of space that can be addressed. So in a 32 bit world the biggest single box that can be created is in the TB range, not big enough for our Telco.

One answer is to just re-compile onto a 64bit system, after all Windows 7 has a 64 bit version! Yes you can do this and publish a new competitive spec sheet, but it won’t perform at scale! The layers of software, (described above) cause an architectural bottleneck, (queues and messaging busses). Also there are structures that you need to keep in memory which creates a cost problem.

Perhaps you throw masses of hardware at it and cluster a number of systems together, using a coordinator or manager system to make it run. Let’s not go into the messaging and management issues with this approach!

So what is needed is essentially a petabyte scale USB memory stick! That is what Isilon delivers – the company EMC just bought. Technically Isilon is elegant in its simplicity, in that it does the job of the RAID, Filesystem and Volume manager in one piece of software which was designed from scratch for scale! This system that is branded OneFS, is the magic. It is fast, as its one layer, it’s scalable as it is duplicated in each node, giving a peer network of nodes which scale linearly.

Now with this giant USB drive, the telco can put all this data in one place and if it could analyse it in almost real-time they could have an incredible competitive weapon…

To be continued in Part 3…!

Advertisements

One response to “What’s the problem with storing BIG DATA? (Big Data, Part 2)

  1. Pingback: What’s the value in Big Data? The Push Shopping example (Big Data, Part 3) | EMC INFORM 2011

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s