Lecture 1
Lecture 1
Lecture 1
Lecture # 1-2
Dataware House book
Book:
The Data Warehouse Toolkit,
by
Ralph Kimball,
2013
Book:
Building the Data Warehouse
W. H. Inmon
Fourth Edition
John Wiley & Sons.
2005.
A producer wants to know….
Which are our
lowest/highest margin
customers ?
Who are my customers
What is the most and what products
effective distribution are they buying?
channel?
USER INTERFACE:
how user enters problem DSS SOFTWARE
& receives answers
SYSTEM
USER
MODELS
DSS INTERFAC
current
DATABASE: data from E OLAP TOOLS
applications or groups
DATA
DATA MININGTOOLS
technology
MINING: for finding
relationships in large data USER
bases for prediction
6
Why we uses DSS?
Increasing complexity of decisions
– Technology
– Information:
“Data, data everywhere, and not the time to think!”
– Number and complexity of options
– Pace of change
Increasing availability of computerized support
– Inexpensive high-powered computing
– Better software
– More efficient software development process
Increasing usability of computers
7
Operational Database
Operational database management systems are used to
manage dynamic data in real-time.
10
Data warehouse Introduction
Subject
“Data Warehouse is a Oriented
subject oriented,
integrated, time-
variant and non-
volatile collection of Non-
volatile
Data
Warehous
Integrated
data in support of e
management’s decision
making process.” – W.
H. Inmon Time
Varian
t
Data warehouse Usage
Three kinds of data warehouse applications
– Information processing
supports querying, basic statistical analysis, and reporting using
crosstabs, tables, charts and graphs
– Analytical processing
multidimensional analysis of data warehouse data
supports basic OLAP operations, slice-dice, drilling, pivoting
– Data mining
knowledge discovery from hidden patterns
supports associations, constructing analytical models, performing
classification and prediction, and presenting the mining results
using visualization tools.
Differences among the three tasks
12
Data warehouse: Subject Oriented
Organized around major subjects, such as customer, product,
sales.
13
Data warehouse: Subject Oriented
Operational Data
Warehouse
14
Data warehouse: Integrated
Constructed by integrating multiple, heterogeneous
data sources
– relational databases, flat files, on-line transaction records
15
Data warehouse: Time Varying
The time horizon for the data warehouse is significantly longer
than that of operational systems.
– Operational database: current value data.
– Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not
contain “time element”.
16
Data warehouse: Time Varying
Operational Data
Warehouse
17
Data warehouse: Non-Volatile
A physically separate store of data transformed from
the
operational environment.
Operational update of data does not occur in the
data warehouse environment.
– Does not require transaction processing,
recovery, and concurrency control mechanisms
– Requires only two operations in data accessing:
initial loading of data and access of data.
18
Data warehouse: Non-Volatile
insert change
Operational Data
Warehouse
insert
delete
load
read only
access
replace
change
19
Data, Data everywhere yet ...
• I can’t find the data I need
– data is scattered over the network
– many versions, subtle differences
• I can’t get the data I need
– need an expert to get the data
• I can’t understand the data I found
– available data poorly documented
6
Difference between Database and data warehouse
FEATURES DATABASE DATA WAREHOUSE
Characteristic It is based on Operational Processing. It is based on Informational Processing.
Data It mainly stores the Current data which It usually stores the Historical data whose
always guaranteed to be up-to-date. accuracy is maintained over time.
User The common users are clerk, DBA, The common users are knowledge worker
database professional. (e.g., manager, executive, analyst)
Unit of work Its work consists of short and simple The operations on it consists of complex
transaction. queries..
Summarization The data is primitive and highly The data is summarized and in consolidated
detailed. form.
View The view of the data is flat relational. The view of the data is multidimensional.
22
Difference between Database and data warehouse
FEATURES DATABASE DATA WAREHOUSE
Function It is used for day-to-day operations. It is used for long-term informational
requirements and decision support.
User The common users are clerk, DBA, The common users are knowledge worker
database professional. (e.g., manager, executive, analyst)
Access The most frequent type of access type is It mostly use the read access for the
read/write. stored data.
Operations The main operation is index/hash on For any operation it needs a lot of scans.
primary key.
Number of A few tens of records. A bunch of millions of records.
records accessed
Query &
Analysis
Metadata Warehouse
Integration
Source Source
Source
Data Warehouse? A Practitioners Viewpoint
“A data warehouse is simply a single, complete, and
consistent store of data obtained from a variety of
sources and made available to end users in a way they
can understand and use it in a business context.”
-- Barry Devlin, IBM Consultant
Data Warehouse Architectures: Conceptual View
Operational Informational
Single-layer systems systems
Two-layer
Real-time + derived data
Operational Informational
Most commonly used approach in systems systems
industry today
Derived Data
Real-time data
Three-layer Architecture: Conceptual View
Transformation of real-time data to derived data
really requires two steps
Operational Informational
systems systems
View level
“Particular informational
Derived Data
needs”
Reconciled Data
Physical Implementation
of the Data Warehouse
Real-time data
Data Warehousing: Two Distinct Issues
(1) How to get information into warehouse
“Data warehousing”
(2) What to do with data once it’s in warehouse
“Warehouse DBMS”
Both rich research areas
Industry has focused on (2)
Issues
in Data Warehousing
Warehouse Design
Extraction
Wrappers, monitors (change detectors)
Integration
Cleansing & merging
Warehousing specification & Maintenance
Optimizations
Miscellaneous (e.g., evolution)
OLTP vs. OLAP
· OLTP: On Line Transaction Processing
- Describes processing at operational sites
Middle tier
Bottom tier
Data
warehouse
server
Backend tools
fig:- A three tier data warehousing
1)Bottom tier:-The bottom tier is a warehouse database
server that is always a relational database system.
Back-end tools and utilities are used to feed data into the
bottom tier from operational databases or other external
sources. These tools and utilities perform data
extraction,cleaning and transformation as well as load and
refresh functions to update the data warehouse.
The date extracted using application program
interfaces known as gateways.
Example of gateways are ODBC(open database
connection)and OLEDB(Open Linking and embedding for
database) by microsoft and jdbc(java database
connecton).
This tier also contains a metadata repository, which stores
information about the data warehouse and its contents.
2.)Middle tier:- The middle tier is an OLAP server
that is typically implemented using either:-