Lect 1
Lect 1
Lecture-1
Introduction and Background
1
Reference Books
W. H. Inmon, Building the Data Warehouse
(Second Edition), John Wiley & Sons Inc., NY.
A. Abdullah, “Data Warehousing for
beginners: Concepts & Issues” (First
Edition).
Paulraj Ponniah, Data Warehousing
Fundamentals,
John Wiley & Sons Inc., NY.
Additional Material
Research Papes
2
At The End of the Course
Develop an application for an organization
of your choice.
3
Approach of the course
►Develop an understanding of underlying RDBMS
concepts.
►Applythese concepts to VLDB DSS environments
and understand where and why they break down?
►Exposethe differences between RDBMS and Data
Warehouse in the context of VLDB.
►Provide the basics of DSS tools such as OLAP, Data
Mining and demonstrate their application.
6
The need
$
POWER
INTELLIGENCE
KNOWLEDGE
INFORMATION
DATA
7
Historical overview
1960
Master Files & Reports
1965
Lots of Master files!
1970
Direct Access Memory & DBMS
1975
Online high performance transaction processing
8
Historical overview
1980
PCs and 4GL Technology (MIS/DSS)
1985 & 1990
Extract programs, extract processing,
The legacy system’s web
9
Historical overview: Crisis of Credibility
What is the financial health of our company?
??
-10%
+10%
10
Introduction and Background
11
Why a Data Warehouse (DWH)?
► Data recording and storage is growing.
► Intelligent
decision-support is required for
decision-making.
12
Reason-1: Why a Data Warehouse?
► Data Sets are growing.
13
Reason-1: Why a Data Warehouse?
► Sizeof Data Sets are going up .
► Cost of data storage is coming down .
14
Reason-1: Why a Data Warehouse?
A Few Examples
►WalMart: 24 TB
►France Telecom: ~ 100 TB
►CERN: Up to 20 PB by 2006
►Stanford Linear Accelerator Center (SLAC):
500TB
15
Caution!
A Warehouse of Data
is NOT a
Data Warehouse
16
Caution!
Size
is NOT
Everything
17
Reason-2: Why a Data Warehouse?
18
Reason-2: Why a Data Warehouse?
DBMS Approach
List of all items that were sold last
month?
What happened?
Why it happened? Stages of
Data
What will happen? Warehouse
What is happening?
What do you want to happen?
21
What is a Data Warehouse?
22
What is a Data Warehouse?
Complete repository
History
Transaction System
Ad-Hoc access
Knowledge workers
23
What is a Data Warehouse?
Transaction System
Management Information System (MIS)
Could be typed sheets (NOT transaction system)
Ad-Hoc access
Dose not have a certain access pattern.
Queries not known in advance.
Difficult to write SQL in advance.
Knowledge workers
Typically NOT IT literate (Executives, Analysts, Managers).
NOT clerical workers.
Decision makers. 24
Another View of a DWH
Subject
Oriented
Integrated
Time
Variant
Non
Volatile
25
What is a Data Warehouse ?
It is a blend of many technologies, the basic
concept being:
26
What is a Data Warehouse ? (Cont…)
It is a blend of many technologies, the basic
concept being:
27
How is it Different?
► Fundamentally different
Business user
needs info
Answers result
User requests
in more questions
IT people
?
Business user
may get answers
IT people do
system analysis
and design
IT people
send reports to IT people
business user create reports
28
How is it Different?
► Different patterns of hardware utilization
100%
0%
Operational DWH
30
How much history?
► Depends on:
Industry.
Cost of storing historical data.
31
How much history?
► Industries and history
Telecomm calls are much much more as compared to
bank transactions- 18 months.
32
How much history?
Data Warehouse a
complete repository of data?
33
How is it Different?
► Usually (but not always) periodic or batch
updates rather than real-time.
34
How is it Different?
35
How is it Different?
► Starts with a 6x12 availability requirement ...
but 7x24 usually becomes the goal.
Decision makers typically don’t work 24 hrs a day and 7
days a week. An ATM system does.
36
How is it Different?
► Starts with a 6x12 availability requirement ...
but 7x24 usually becomes the goal.
For business across the globe, 50% of the world may be
sleeping at any one time, but the businesses are up 100%
of the time.
37
How is it Different?
► Does
not follows the traditional development
model
Requirements
Program
Classical SDLC
Requirements gathering
Analysis
Design
Programming
Testing
Integration
Implementation
38
How is it Different?
► Does
not follows the traditional development
model
DWH
Program
Requirements
DWH SDLC (CLDS)
Implement warehouse
Integrate data
Test for biasness
Program w.r.t data
Design DSS system
Analyze results
Understand requirement 39
Data Warehouse Vs. OLTP
40
Data Warehouse Vs. OLTP
DWH
Select balance, age, sal, gender from
customer_table, tx_table
Where age between (30 and 40) and
Education = ‘graduate’ and
CustID.customer_table =
Customer_ID.tx_table;
41
Data Warehouse Vs. OLTP
OLTP DWH
Primary key used Primary key NOT used
No concept of Primary Index Primary index used
Few rows returned Many rows returned
May use a single table Uses multiple tables
High selectivity of query Low selectivity of query
Indexing on primary key Indexing on primary index
(unique) (non-unique)
42
Data Warehouse Vs. OLTP
OLTP: OnLine Transaction Processing (MIS or Database System)
44
Putting the pieces together
MOLAP
Sources Query/Reporting
www data
Meta
Data
Extract
Data Analysis
IT
Archived
data
Transform
Load
(ETL)
Warehouse
ROLAP
Data Mining
Business
Users
Users
Operational
Data Bases
Data sources Data Marts Tools
Business Users
45