Saturday, March 12, 2016

Big Data Basics Part - 1

The term “big data” remains difficult to understand because it can mean different things to different set of people. Behavioural economist Dan Ariely once compared Big Data to teenage sex: “everyone talks about it, nobody really knows how to do it, and everyone thinks everyone else is doing it, so everyone claims they are doing it.”   

So what is Big Data?
Big data explained in simple terms by Bernard Marr:
The basic idea behind the phrase 'Big Data' is that everything we do is increasingly leaving a digital trace (or data), which we (and others) can use and analyse.
Big Data therefore refers to our ability to make use of the ever-increasing volumes of data.

Type of data (Datafication):
  • Activity Data - Digital music players and eBooks collect data on our activities. Your smart phone collects data on how you use it and your web browser collects information on what you are searching for.
  • Conversation Data - Most of our conversations leave a digital trail. Just think of all the conversations we have on social media sites like Facebook or Twitter. Even many of our phone conversations are now digitally recorded.
  • Photo and Video Image Data- . We upload and share 100s of thousands of them on social media sites every second. The increasing amounts of CCTV cameras take video images and we up-load hundreds of hours of video images.
  • Sensor Data - Your smart phone contains a global positioning sensor to track exactly where you are every second of the day, it includes an accelometer to track the speed and direction at which you are travelling.
  • The Internet of Things Data - Smart TVs are able to collect and process data, we have smart watches, smart fridges, and smart alarms. The Internet of Things connects these devices.


Currently we need to wait a considerable amount of time to gather the data from around the word, analyze it, and take action.  The process is slow and inefficient and contributing factors includes; Not having fast enough computer systems capable of gathering and storing the ever changing data (velocity), not having computer systems that can accommodate the volume of the data pouring in from all of the sources (volume), not having computer systems that can process images, media files etc e.g. x-rays, mp3 (variety) and messiness or trustworthiness of the data (veracity).  Big Data technology changed abovementioned issues by solving the velocity-volume-variety-veracity problem.

How it’s different from traditional BI?
To understand the difference between Big Data and Traditional BI, let’s first look how Analytics has changed/ improved over the period of time:


The goal of any analytics solution is to provide the organization with actionable insights for smarter decisions and better business outcomes. Once you have enough data, you start to see patterns and you then start building a model of how these data work. Once you build a model, you can predict.
Different types of analytics, however, provide different types of insights (refer figure above). The analytics models are moving from descriptive analytics through Predictive to Prescriptive.  
  1. ·      Descriptive Analytics (The first step, insight into the past). This is the simplest class of analytics that allows you to condense data into smaller, more useful nuggets of information. It uses data aggregation and data mining techniques to summarize raw data and make it something that is interpretable by humans providing an insight into the past and answer: “What has happened?”
  2. ·      Predictive Analytics (Predict/ Understand the future). It utilizes a variety of statistical, modelling, data mining, and machine learning techniques to study recent and historical data, thereby allowing analysts to make predictions about the future. Predictive analytics can only forecast what might happen in the future, because the foundation of predictive analytics is based on probabilities that use statistical models and forecasts techniques to understand the future and answer: “What could happen?”
  3. ·      Prescriptive Analytics (Advise on possible outcomes). The relatively new field of prescriptive analytics allows users to “prescribe” a number of different possible actions to and guide them towards a solution. In a nut-shell, these analytics predicts not only what will happen, but also why it will happen by providing recommendations regarding actions that will take advantage of the predictions. It uses optimization and simulation algorithms to advice on possible outcomes and answer: “What should we do?”


Now let’s try to compare traditional BI and Analytics (descriptive and predictive) with Big Data (+ Prescriptive)

Traditional business intelligence (BI) has always been top-down, putting data in the hands of executives and managers who are looking to track their businesses on the big-picture level. Big Data, on the other hand, is bottom-up. It empowers business end-users to carry out in-depth analysis to inform real-time decision-making.

BI is about making decisions and analytics is about asking questions: Which product model got the most complaints? What is the lead conversion ratio of a particular product? Which products are selling more in north-east states? In other worlds traditional BI and analytics is about getting answers you already know are important, and because you know they’re important you put mechanisms in place to produce the key metrics. Big Data, on the other hand, is about finding answers to questions you didn’t even know you had.

The scope of traditional BI is limited to structured data that can be stuffed into columns and rows on a data warehouse. BI could never have anticipated the multitude of images, MP3 files, videos and social media snippets that companies would contend with.  Big Data refers to the immense volumes of data (structured and unstructured) available online and in the cloud, which requires ever more computing power to gather and analyze.


Prescriptive analytics is the future of Big Data, its potential is enormous, but it also requires massive amounts of data to be able to make correct decisions. You have to collect, store, analyze, organize, purge, and use the data. It's that process from collection to use to purge that is the great unknown of big data. Hope you find the article helpful in connecting outline dots.