Big Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48
At a glance
Powered by AI
The key takeaways are about big data roadmap timeline, data growth over time, data landscape, big data facts and challenges.

Some of the big data predictions mentioned are that Yale library will have 200 million volumes by 2040, scientific journals will double every 15 years, words were introduced as a unit of measurement in 1975, and the term 'big data' was coined in 1997.

Data growth has been measured over time in units like volumes, words, zettabytes. It is predicted that the amount of digital data will grow from 3.2 zettabytes today to 40 zettabytes by 2020.

WDABT 2016 – BHARATHIAR UNIVERSITY

BIG DATA ROADMAP

Dr.V.Bhuvaneswari
Assistant Professor
Department of Computer Applications
Bharathiar University
Coimbatore
[email protected], [email protected]
visit at www.budca.in/faculty.php
Big Data Roadmap
 Timeline – Big Data Predictions
 Data Growth in Units
 Data Landscape
 Data Explosion
 Big Data Myths
 Big Data
 5Vs of Big Data
 Why Big Data
 Data as Data Science

3
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Timeline – Big Data
Predictions
1944- Yale Library in 2040 will have “approximately
200,000,000 Volumes
1961- Scientific Journals will grow exponentially rather than
linearly, doubling every fifteen years and increasing
by a factor of ten during every half-century.
1975- Ministry of Posts and Telecommunications in Japan
introduced words as unifying unit of measurement
1997- First article published by Michael Cox and David
Ellsworth in in the ACM digital library to the term
“Big data.”

Big Data evolved in 1997 and exploded to greater heights in


2010 and become popular in 2012
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 4
Data Growth – in Units

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 5


Data Landscape

Dr.V.Bhuvaneswari, Asst.Professor,
Dept. of Computer Applications,
Bhararthiar University 6
BIG DATA FACTS
 Every 2 days we create as much information
as we did from the beginning of time until
2003
 Over 90% of all the data in the world was
created in the past 2 years.
 It is expected that by 2020 the amount of
digital information in existence will have
grown from 3.2 zettabytes today to 40
zettabytes.
 Every minute we send 204 million emails,
generate 1.8 million Facebook likes, send
278 thousand Tweets, and up-load 200,000
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 7
Big Data Explosion
30 billion RFID 4.6
tags today billion
12+ TBs camera
(1.3B in 2005)
of tweet data phones
every day world
wide

100s of
millions
of GPS
data every
? TBs of

enabled
day

devices
sold
annually

25+ TBs 2+
of billion
log data people
on the
every day 76 million smart Web by
meters in 2009… end 2011
200M by 2014
Data Deluge
Big Data Market Size
Potential Talent Pool -Big
Data

India will require a minimum of 1 lakh data scientists in the next couple
of years in addition to data analysts and data managers to support the
Big Data space.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 11
BIG DATA MYTHS
Big Data
• New
• Only About Massive Data Volume
• Means Hadoop
• Need A Data Warehouse
• Means Unstructured Data
• for Social Media & Sentiment
Analysis
Dr.V.Bhuvaneswari, Asst.Professor,
Dept. of Computer Applications,
Bhararthiar University 12
Lets Us Clarify

Dr.V.Bhuvaneswari, Asst.Professor,
Dept. of Computer Applications,
Bhararthiar University 13
Big Data
Big Data is
 A complete subject with tools, techniques
and frameworks.
 Technology which deals with large and
complex dataset which are varied in data
format and structures, does not fit into
the memory.
 Not about huge volume of data; provide
an opportunity to find new insight into the
existing data and guidelines to capture
and analyze future data
Dr.V.Bhuvaneswari, Asst.Professor,
Dept. of Computer Applications,
Bhararthiar University 14
Big Data : A Definition
 Big data is the realization of greater
business intelligence by storing,
processing, and analyzing data that
was previously ignored due to the
limitations of traditional data
management technologies

:Source: Harness the Power of Big Data: The IBM Big Data Platform

Dr.V.Bhuvaneswari, Asst.Professor,
Dept. of Computer Applications,
Bhararthiar University 15
BIG DATA as Platform

Source: IBM
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 16
4 V‘s of Big Data

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 17


5Vs of Big Data
 Volume
 Velocity
 Variety
 Veracity
 Value

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 18


Why Big Data ?

19
The 5 Key Big Data Use Cases

Big Data Enhanced 360o View Security/Intelligence


Exploration of the Customer Extension
Find, visualize, Extend existing customer Lower risk, detect fraud
understand all big views (MDM, CRM, etc) by and monitor cyber security
data to improve incorporating additional in real-time
decision making internal and external
information sources

Operations Analysis Data Warehouse Augmentation


Analyze a variety of machine Integrate big data and data warehouse
data for improved business results capabilities to increase operational
2 efficiency
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer
0 Applications, Bhararthiar University
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 21
Data Science
 "Data Science" was used by
statisticians and economist in early
1970 and defined by Peter Naur in
1974.
 Data Science” has gained popularity in
the last couple of years because of the
massive data deposits
 Usage of Big Data technology to
explore data used in large corporates,
government and industries made the
term data science catchy.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 22
Data Science as Discipline
 Data Science has emerged as a new discipline to
provide deep insight on the large volume of data.
 Data Science is fusion of major disciplines like
Computational Algorithms, Statistics and
Visualization
 90% of the world’s data has been created in the
last two years which includes 10% of structured
data and 80% of unstructured data
 The digital universe is in data deluge and
estimated to be larger than the physical universe
and data unit measurement is predicted as
Geopbytes

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 23


Dr.V.Bhuvaneswari, Asst.Professor,
Dept. of Computer Applications,
Bhararthiar University 24
Data Growth in Bytes

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 25


Data Classification
◦ Open Data
◦ Closed Data
◦ Hot Data
◦ Warm Data
◦ Cold Data
◦ Thin Data
◦ Thick Data

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 26


Data Analytics – Need for
today
 Data considered as digital asset
similar to other property.
 The organizations believe data
generated by them will provide deep
insights to understand their business
process for arriving strategic
decisions.
 The earlier limitation of computational
storage and processing is overcome
by the technologies of cloud
computing and big data techniques.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 27
Data Science Components

Data Models
Linear
Regression,
Decision Tree, Pre-Processing
Dimensionality - ETL
Reduction

Dash Clustering
Boards Outlier
ChartsPie, Analysis
Bar Association
Histogram Analysis

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 28


Data Science - Big Data Technology
 Collect, Load, Transform
◦ ETL SCRIBE, FLUME
 Store
◦ HADOOP, SPARK, STORM
 Process, Analyze and Reasoning
◦ Computational Algorithms,
◦ Statistical Methods and Models
 R, PIG, HIVE,
 PHYTON, JAVA, SCALA,
 CLOJURE, MAHOUT
 Visualization
◦ DASHBOARD, APP
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 29
Data Science Vs Data Analytics
 Data Science is a discipline which
groups techniques and methods from
various domains to study about data
and data analytics is a component in
Data Science.
 Data Analytics is a process of
analyzing the dataset to find deep
insights of data using computational
algorithms and statistical methods.
There exists no common procedure to
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 30
Data Analytics Vs Big Data
Analytics
 Data Analytics is used to explore and
analyze datasets using statistical
methods and models.
 Big Data Analytics is used to analyze
data with the characteristics of
Volume, Velocity and Variety by
integrating statistics, mathematics,
computational algorithms in Big data
Platform.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 31
Data Science – Emerging
Roles
 Data Scientist is responsible for scrubbing data
to bring out deep insights of data
Skills : Expert in CS, Mathematics, Statistics
Work on open ended research problems
 Data Engineer is responsible for managing and
administering the infrastructure and storage of
data.
Skills : Strong skills in Programming and Software Engineering
 Deep Knowledge in Data warehousing
 Expertise in Hadoop, NOSQL and SQL technologies
 Data Analyst is one who views the data from one
source and has deep insight on the data based on
the organization guidance.
Skills : Competency Skills in understanding of Statistics

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 32


Data Analytics Use Case
Scenario

33
Data Science Applications
 Data Personalization - Logs, Tweets, Likes
 Smart Pricing – Air Transportation
 Financial Services – Fraud Detection
Insurance
 Smart Grids – Energy Management

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 34


Air Fare Management – Use
case 1
Objectives: Hike airfare based on High Value
Customers - CRM.
Strategic decision requires Understanding of data
insights
How customers are divided?
Which customer is high value customer?
Who is Frequent flyer?
How to retain customers?
Data sources :
Conventional Enterprise information
Data from weblogs, social media, competitors pricing

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 35


Data Engineering
Airfare Classification (Economy, Business,First)
Analyse factors (Enterprise Datasources) – Data
Exploration techniques
Passenger Booking information
Forecasted data - Statistics
Inventory
Customers Behavioral data - Predictive Analytics –
Statistical models – Decision tree, classification
Information has to be gained from websites that
provide route information, dining, preferable locations
Holistic Analytics
Analyzing customer data from Social profiles,
sales, CRM etc.

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 36


Complexities and Challenges
Data is larger than terabytes
Data integration
Variety data formats
Solution
Big data Accelerators
Hadoop ecosystem
Analytic components
Integrated data warehouses

Source: Big data spectrum Infosys


Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 37
Insurance Fraud Detection – Use
case Scenario
Data Engineering
Verifying customer data
Customer Profile analysis
Verification of claims raised
Fraud detection from disparate systems
Exact claim reimbursement
Data Sources
Data about customer, product sold from ERP,
CRM
Credit history from other sources
Data from social networking – Customer
profiles, product rating, credit rating from 3rd
parties Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 38
Health Epidemics
Data Engineering
Kind of epidemics and target users
Causes and effects with respect to locations
Environmental and other related issues of
epidemics
Data on Awareness
Data Sources
EHR records, Medical Insurance claims,
Socialmedia – awareness, ERP Systems
Data Analytics
Descriptive Analytics
Predictive Analytics ( Model based
analysis) Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 39
Big Data Challenges
Privacy Protection
All Big data stages collect, store, process,
knowledge
Integration with enterprise landscape
All systems store data in rdbms,DW
Does not support bulk loading to Big data store
Limited number of analytics from Mahout
Big data technologies lack visualization support
and deliverable methods
Leveraging cloud computing for big data applications
Addressing Real time needs with varied format
and volume 40
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
PART B : Big Data Use
Cases – Scenario

41
Big Data Applications

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 42


Big Data Applications - India
 Big Data – Elections
 SBI uses big data mining to check
defaults
 Karnataka Govt – Identify water
leakage

Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 43


Big Data - Election
 Mined data from every Internet user in
the country, to accurately understand
voter sentiments and local issues.
 Data-based analysis was used to raise
funds and create different models for
different regions targeting on local
issues.
 India involve more than 800 million
voters with different ideologies and
expectations.
 Innovative usage of Big Data marked a
huge change in the way elections were
fought traditionally.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 44
Data Analytics
 Modac Analytics built electroal data.
 Processing huge volumes of
unstructured data (around 10TB of
PDF documents), and also structured
data.
 Modak chose Hadoop, and self-built a
64-node cluster that had 128TB of
storage. Apart from Hadoop, the team
used PostgreSQL as the front-end
database.
 They have developed Rapid ETL to
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 45
SBI
 State Bank of India (SBI) ran its newly
acquired data-mining software recently to
check for purity of data.
 Made an interesting find - close to one crore
accountholders have not provided any
nomination for their savings accounts. What
is worse, over half of them are senior
citizens.
 To analyse trends in Banks, SBI has hired a
whole team of statisticians and economists.
 Identify default patterns, high value
customers.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 46
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 47
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 48

You might also like