The term “big data” remains difficult to understand because
it can mean different things to different set of people. Behavioural economist
Dan Ariely once compared Big Data to teenage sex: “everyone talks about it,
nobody really knows how to do it, and everyone thinks everyone else is doing
it, so everyone claims they are doing it.”
So what is Big Data?
Big data explained in simple terms by Bernard Marr:
The basic
idea behind the phrase 'Big Data' is that everything we do is increasingly
leaving a digital trace (or data), which we (and others) can use and
analyse.
Big Data
therefore refers to our ability to make use of the ever-increasing volumes
of data.
Type of data (Datafication):
- Activity Data - Digital music players and eBooks collect data on our activities. Your smart phone collects data on how you use it and your web browser collects information on what you are searching for.
- Conversation Data - Most of our conversations leave a digital trail. Just think of all the conversations we have on social media sites like Facebook or Twitter. Even many of our phone conversations are now digitally recorded.
- Photo and Video Image Data- . We upload and share 100s of thousands of them on social media sites every second. The increasing amounts of CCTV cameras take video images and we up-load hundreds of hours of video images.
- Sensor Data - Your smart phone contains a global positioning sensor to track exactly where you are every second of the day, it includes an accelometer to track the speed and direction at which you are travelling.
- The Internet of Things Data - Smart TVs are able to collect and process data, we have smart watches, smart fridges, and smart alarms. The Internet of Things connects these devices.
Currently we need to wait a considerable amount of time to
gather the data from around the word, analyze it, and take action. The
process is slow and inefficient and contributing factors includes; Not having
fast enough computer systems capable of gathering and storing the ever changing
data (velocity), not having computer
systems that can accommodate the volume of the data pouring in from all of the sources
(volume), not having computer
systems that can process images, media files etc e.g. x-rays, mp3 (variety) and messiness or trustworthiness of the data (veracity). Big Data
technology changed abovementioned issues by solving the
velocity-volume-variety-veracity problem.
How it’s different from traditional BI?
To understand the difference between Big Data and
Traditional BI, let’s first look how Analytics has changed/ improved over the
period of time:
The goal of any analytics solution is to provide the
organization with actionable insights for smarter decisions and better business
outcomes. Once you have enough data, you start to see patterns and you then
start building a model of how these data work. Once you build a model, you can
predict.
Different types of analytics, however, provide different
types of insights (refer figure above). The analytics models are moving from descriptive analytics
through Predictive to Prescriptive.
- · Descriptive Analytics (The first step, insight into the past). This is the simplest class of analytics that allows you to condense data into smaller, more useful nuggets of information. It uses data aggregation and data mining techniques to summarize raw data and make it something that is interpretable by humans providing an insight into the past and answer: “What has happened?”
- · Predictive Analytics (Predict/ Understand the future). It utilizes a variety of statistical, modelling, data mining, and machine learning techniques to study recent and historical data, thereby allowing analysts to make predictions about the future. Predictive analytics can only forecast what might happen in the future, because the foundation of predictive analytics is based on probabilities that use statistical models and forecasts techniques to understand the future and answer: “What could happen?”
- · Prescriptive Analytics (Advise on possible outcomes). The relatively new field of prescriptive analytics allows users to “prescribe” a number of different possible actions to and guide them towards a solution. In a nut-shell, these analytics predicts not only what will happen, but also why it will happen by providing recommendations regarding actions that will take advantage of the predictions. It uses optimization and simulation algorithms to advice on possible outcomes and answer: “What should we do?”
Now let’s try to compare traditional BI and Analytics (descriptive
and predictive) with Big Data (+ Prescriptive)
Traditional business intelligence (BI) has always been
top-down, putting data in the hands of executives and managers who are looking
to track their businesses on the big-picture level. Big Data, on the other
hand, is bottom-up. It empowers business end-users to carry out in-depth
analysis to inform real-time decision-making.
BI is about making decisions and analytics is about asking
questions: Which product model got the most complaints? What is the lead
conversion ratio of a particular product? Which products are selling more in
north-east states? In other worlds traditional BI and analytics is about
getting answers you already know are important, and because you know they’re
important you put mechanisms in place to produce the key metrics. Big Data, on
the other hand, is about finding answers to questions you didn’t even know you
had.
The scope of traditional BI is limited to structured data
that can be stuffed into columns and rows on a data warehouse. BI could never
have anticipated the multitude of images, MP3 files, videos and social media
snippets that companies would contend with. Big Data refers to the
immense volumes of data (structured and unstructured) available online and in
the cloud, which requires ever more computing power to gather and analyze.
Prescriptive analytics is the future of Big Data, its potential
is enormous, but it also requires massive amounts of data to be able to make
correct decisions. You have to collect, store, analyze, organize, purge, and
use the data. It's that process from collection to use to purge that is the
great unknown of big data. Hope you find the article helpful in connecting
outline dots.