Week-1 Introduction To BDDA-TWM PDF
Week-1 Introduction To BDDA-TWM PDF
Week-1 Introduction To BDDA-TWM PDF
2
COURSE OBJECTIVE
1. Understand conceptual, framework, opportunity and challenge of
Big Data
2. Understand concept, theory, framework from Data Analytics
activities
3. Ability to choose and perform Data Analytics activities based on the
contextual business problem
4. Ability to build description model and prediction model using
available data
3
INTRODUCTION TO BIG DATA
& DATA ANALYTICS
Week 1 – EBI3B4 Big Data & Data Analytics
--
4
OUTLINE
o Introduction, Background of Big Data
o Big Data, Data Analytics, Data Science
o Big Data Properties
o Data Exponential Growth – Data Driven Decision Making
o Big Data Complexity
o Big Data Optimization and Trade Off
o Big Data Reduction Complexity Strategy
5
LARGE SCALE DATA
6
LARGE SCALE DATA
7
SOCIAL NETWORK DATA
8
BACKGROUND OF BIG DATA
1. We generate huge amounts of data (from UGC / mobile habit to machine generation
data / sensors / IoT)
2. Our society leaves massive digital footprint (so our behavior / attitudes)
3. Finding unexpected pattern is so exciting (also useful for predictive analytics)
4. In 2020s, we are entering AI era, where we needs massive analytics efforts
Source: Super intelligence
(Nick Bostrom)
oBig Data : It is a term for data sets that are so large or complex that
traditional data processing tools are inadequate to process. The
challenges include analysis, capture, data curation, search, sharing,
storage, transfer, visualization, querying, updating and information
privacy (wikipedia)
oData Analytics : It is the process of examining raw data with the
purpose of drawing conclusions about that information. Data
Analytics is used in many industries to allow companies and
organization to make better business decisions and in the sciences to
verify or disprove existing models or theories (wikipedia)
For your reference see Video :
Big Data : https://www.youtube.com/watch?v=aC2CmTTZTVU
10
DEFINITION
Big Data, Data Analytics, Social Computing, Data Science
Volume, Variety, and Velocity are the "essential" characteristics of Big Data
DATA EXPONENTIALY
GROWTH
DATA DRIVEN DECISION MAKING
1. Data science involves principles, processes, and techniques for
understanding phenomena via the (automated) analysis of data
2. The ultimate goal of data science as improving decision making, as this
generally is of direct interest to business
3. Statistically, the more data-driven a firm is, the more productive it is—
even controlling for a wide range of possible confounding factors. And the
differences are not small. One standard deviation higher on the DDD scale
is associated with a 4%–6% increase in productivity.
4. DDD also is correlated with higher return on assets, return on equity,
asset utilization, and market value, and the relationship seems to be
causal
https://www.plutora.com/blog/data-driven-decision-making 15
4 types of analytics to create business & Opportunities
16
Data Analytics Example (in a supermarket) :
23
COMPLEXITY THEORY
A reference to watch :
https://www.youtube.com/watch?v=9YRw0Yk7N8c
https://www.youtube.com/watch?v=Du1q5oA7Cik 30
Optimization in Computing Context:
Speed vs Accuracy in Machine Learning Model
34
MOORE’S LAW
• Moore's law
is the observation that the
number of transistors in a
dense integrated
circuit doubles
approximately every two
years.
35
COMPUTATIONAL
POWER
36
Choose what’s best for you
(or you may say Optimization)
37
LEVEL OF OPTIMIZATION
1. Design level
2. Algorithms and data structures Our interest for this course
38
STRENGTH REDUCTION
• Computational tasks can be performed in several different ways with varying
efficiency. A more efficient version with equivalent functionality is known as
a strength reduction.
• For example, consider the following C code snippet whose intention is to obtain
the sum of all integers from 1 to N:
1. int i, sum = 0;
2. for (i = 1; i <= N; ++i) {
3. sum += i;
4. }
5. printf("sum: %d\n", sum);
• This code can (assuming no arithmetic overflow) be rewritten using a
mathematical formula like:
1. int sum = N * (1 + N) / 2;
2. printf("sum: %d\n", sum);
39
Strength Reduction should…
1. Minimize space / size
2. Minimize time
40
THINGS GROW FAST: EXPONENTIALLY
• Exponential growth is a phenomenon
that occurs when the growth rate of the
value of a mathematical function is
proportional to the function's current
value, resulting in its growth with time
being an exponential function.
41
BORROW BEST PRACTICES FROM
MANAGEMENT KNOWLEDGE
How To Reduce Complexity In Five Simple Steps
1. Clear the underbrush, get rid of ambiguous rules and low-value
activities, time-wasters
2. Clear perspective, focus on specific goals
3. Prioritize most important things
4. Take shortest path by eliminating loops, redundancies, and also
create things leaner
5. Reduce levels
42
GRAPH DATABASE to Represent Complex
Relationship in Data
• Graph Database is a database that uses graph structures for semantic queries with nodes, edges and properties to
represent and store data. A key concept of the system is the graph (or edge or relationship), which directly relates
data items in the store. The relationships allow data in the store to be linked together directly, and in most cases
retrieved with a single operation.
conventional/legacy RDBMS
Graph database
Video : https://www.youtube.com/watch?v=GM9bB4ytGao 43
CASE STUDY
• Big Data at Verizon Wireless (Phoenix Suns)
• Big Data at Schneider National
• Big Data at UPS
• Big Data at United Healthcare
• Big Data at Macys.com
• Big Data at Bank of America
• Big Data at Citigroup
Read Book:
Big Data at Works: Chapter : What You Can Learn from Large Companies: Big Data and Analytics 3.0
Davenport, T. (2014). Big data at work: dispelling the myths, uncovering the opportunities. Harvard Business Review Press.
Use Case
• GOJEK : Predict one’s favorite food, even though the particular person
never order on the particular restaurant.
• Veritrans (MidTrans) : Detect fraud transaction among millions
ecommerce transactions very fast using clustered network analysis
based on triangular customer information about their credit card,
phone number, and email address.
• Modalku : Create an algorithm to see a particular persons / SME
eligible to lend the money / get the investment.
• Bank Mandiri : Use customer data to understand customer (wallet
size), improve lead management (targeted customer), detect fraud
transaction (using network analysis)
Case Study : Customer Voice (Telco)
Telkomsel XL
Network Text Analysis to Summarize Online Conversations for Marketing Intelligence Efforts in Telecommunication Industry (Alamsyah at al, 2016)
Case Study : Predict Travel Price
• https://www.academia.edu/28776805/Prediction_Models_Based_on_Flight_Tickets_and_Hotel_Room
s_Data_Sales_for_Recommendation_System_in_Online_Travel_Agent_Business
To wrap up, Why Big Data Analytics Matters
1. NEW DATA
Example: eCommerce capturing clickstream
2. UNLOCKING VALUE
Example: Sentiment analysis from online social network
3. SHAPING THE FUTURE
Example: modelling the future, anticipating & influencing
Assignment Week 1
Find a Case Study of Big Data Implementation / Application for Business
or others
o Download the template (*.doc) for assignment week 1
o State the objective, problems, solution idea
o Upload as requested (due date) in form of pdf file