Integrating Big Data is Like Mixing Oil and Water

Oil and Water Don't MixI have never liked the term ‘big data’, it tends to be an overused buzzword. I prefer the term ‘large unstructured data’ because it is a more accurate definition. As one might expect, it is difficult to effectively combine structured data with unstructured data, in many ways it is like mixing oil and water. If you put oil and water together in a jar and shake it, you will get a mixture but it is unstable and useless for pretty much anything. Data is much the same. You cannot just mash everything together and expect to gain useful insights, you need the right approach.

Actually, Oil and Water Do Mix

Contrary to the old saying, oil and water do mix, you just have to know what you are doing. Don’t believe me? Here is an article explaining how it works. The key to making it work is something called an emulsifier.

Many organizations are missing something as they try to ‘mix’ their data. In data management terms the emulsifier is the data warehouse, it acts as the core of your data management system and works to relate structured with unstructured data in a meaningful way. The ability to relate data in various forms with your core data warehouse dimensions is where you gain business insights and add value to your data.

Why Data Warehouses are Being Marginalized

data warehouseOver the last 5 years, ‘big data’ projects have been started by many, but successfully implemented by few. I feel a lot of the problem has been the willingness of software vendors to sell solutions that overstate features, and downplay the importance of having a structured data warehouse. Very few vendors were advocating data warehouses, rather they were recommending in-memory solutions for any size project. Nothing was impossible. This message trickled down to the business users, who in turn felt empowered to install analytics software and mash together data sets locally.

The size of the datasets became so large the business users were struggling to join them. In response, they requested larger machines to plow through the datasets; this of course was unsustainable. The data was not growing at the same rate as processing power, customers were generating social and online activity data too quickly. Now what?

We circle back to a single source for structured, aggregated data; the data warehouse. There is much to be said for having a quality source of structured data to act as the building block of an analytics platform.

I very much agree with this article from Gartner titled ‘Debunking the 5 Greatest Myths of Big Data’. It is a refreshing look at the reality of big data.

No Free Lunches Yet

If you want to get real value from data based insights you need clean, structured, and consistent data organized in a data warehouse. This is the current reality for most organizations and it will be for the foreseeable future if you ask me.


James Ciesla

James Ciesla is an IT professional who specializes in data management, data analytics, and IT strategy.

Leave a Reply