0% found this document useful (0 votes)
66 views

DWM Unit-I Notes

The document discusses the key concepts of data warehousing including: 1) It provides definitions of a data warehouse as a subject-oriented, integrated, non-volatile collection of data to support management decisions. 2) Data warehouses have features such as being subject-oriented, integrated, time-variant and non-volatile. 3) It describes the differences between a data warehouse and a data mart, and the typical three-tier architecture of a data warehouse including bottom, middle and top tiers.

Uploaded by

Ankita Pawar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

DWM Unit-I Notes

The document discusses the key concepts of data warehousing including: 1) It provides definitions of a data warehouse as a subject-oriented, integrated, non-volatile collection of data to support management decisions. 2) Data warehouses have features such as being subject-oriented, integrated, time-variant and non-volatile. 3) It describes the differences between a data warehouse and a data mart, and the typical three-tier architecture of a data warehouse including bottom, middle and top tiers.

Uploaded by

Ankita Pawar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

UNIT-I

INTRODUCTION TO DATA WAREHOUSING

Evolution of decision support systems, Failure of past decision


support system, Operational v/s decision support systems, Data
warehousing lifecycle, Architecture, Building blocks, Components of
DW, Data Marts and Metadata

1
Prof.Jayant S. Rohankar
Tulsiramji Gaikwad-Patil College of Engineering & Technology,
Nagpur
Department of Information Technology
Subject Notes
Academic Session: 2022 – 2023

Subject: Data Warehousing & Mining Semester: VII

Unit I

Data Warehouse:

1. Data Warehouse is a subject oriented, integrated, nonvolatile, and time


variant collection of data in support of management’s decisions.
2. A data warehouse is a semantically consistent data store that serves as a
physical implementation of a decision support data model and stores the
information on which an enterprise needs to make strategic decisions. A
data warehouse is also often viewed as an architecture, constructed by
integrating data from multiple heterogeneous sources to support
structured and/or ad hoc queries, analytical reporting, and decision
making.

Features of Data warehouse:

Subject-oriented:

 A data warehouse is organized around major subjects, such as customer,


supplier, product, and sales. Rather than concentrating on the day-to-
day operations and transaction processing of an organization, a data
warehouse focuses on the modeling and analysis of data for decision
makers.

2
Prof.Jayant S. Rohankar
 Hence, data warehouses typically provide a simple and concise view
around particular subject issues by excluding data that are not useful in
the decision support process.

Integrated:

 A data warehouse is usually constructed by integrating multiple


heterogeneous sources, such as relational databases, flat files, and on-
line transaction records.
 Data cleaning and data integration techniques are applied to ensure
consistency in naming conventions, encoding structures, attribute
measures, and so on.

3
Prof.Jayant S. Rohankar
Time-variant:

 Data are stored to provide information from a historical perspective (e.g.,


the past 5–10 years). Every key structure in the data warehouse
contains, either implicitly or explicitly, an element of time.

Nonvolatile:

 A data warehouse is always a physically separate store of data


transformed from the application data found in the operational
environment.
 Due to this separation, a data warehouse does not require transaction
processing, recovery, and concurrency control mechanisms. It usually
requires only two operations in data accessing: initial loading of
data and access of data.

4
Prof.Jayant S. Rohankar
Data warehouse and a Data mart:

DATA WAREHOUSE
Corporate/Enterprise-wide
Union of all data marts
Takes time to build
Low risk of failure
Structure to suit the departmental view of data
Data received from staging area
Well structured and architecture
Queries on presentation resource

DATA MART
Departmental -wide
A single business process
Faster and easier implementation
High risk of failure

5
Prof.Jayant S. Rohankar
Structure for corporate view of data
Data received from Star joins( facts & dimensions)
Each data mart has its own narrow view of data

Architecture of Data Warehouse:

Data warehouses often adopt a three-tier architecture, as presented in Figure.

Bottom Tier:

 The bottom tier is a warehouse database server that is almost always


a relational database system.
 Back-end tools and utilities are used to feed data into the bottom tier
from operational databases or other external sources (such as
customer profile information provided by external consultants).
 These tools and utilities perform data extraction, cleaning, and
transformation (e.g., to merge similar data from different sources into
a unified format), as well as load and refresh functions to update the
data warehouse.
 The data are extracted using application program interfaces known as
gateways. A gateway is supported by the underlying DBMS and allows
client programs to generate SQL code to be executed at a server.
Examples of gateways include ODBC and OLEDB (Open Linking and
Embedding for Databases) by Microsoft and JDBC.
 This tier also contains a metadata repository, which stores
information about the data warehouse and its contents.

Middle Tier:

6
Prof.Jayant S. Rohankar
 The middle tier is an OLAP server that is typically implemented using
either

o a relational OLAP (ROLAP) model, that is, an extended relational


DBMS that maps operations on multidimensional data to
standard relational operations; or
o a multidimensional OLAP (MOLAP) model, that is, a special-
purpose server that directly implements multidimensional data
and operations.

Top Tier:

 The top tier is a front-end client layer, which contains query and
reporting tools, analysis tools, and/or data mining tools (e.g., trend
analysis, prediction, and so on).

7
Prof.Jayant S. Rohankar
8
Prof.Jayant S. Rohankar
The KDD process( Lifecycle of Data Warehousing):

Knowledge discovery as a process is depicted and consists of an iterative sequence of the


following steps:

1. Data cleaning: to remove noise and inconsistent data


2. Data integration: where multiple data sources may be combined
3. Data selection: where data relevant to the analysis task are retrieved from the database
4. Data transformation: where data are transformed or consolidated into forms appropriate
for mining by performing summary or aggregation operations.

9
Prof.Jayant S. Rohankar
5. Data mining: an essential process where intelligent methods are applied in order to
extract data pattern.
6. Pattern evaluation to identify the truly interesting patterns representing knowledge based
on some interestingness measures;
7. Knowledge presentation where visualization and knowledge representation techniques
are used to present the mined knowledge to the user.

Steps 1 to 4 are different forms of data preprocessing, where the data are prepared for mining.
The data mining step may interact with the user or a knowledge base.

The interesting patterns are presented to the user and may be stored as new knowledge in the
knowledge base. Data mining is only one step in the entire process but an essential one because it
uncovers hidden patterns for evaluation. Therefore, data mining is a step in the knowledge
discovery process.

Prof.Jayant S. Rohankar
Subject Incharge

10
Prof.Jayant S. Rohankar

You might also like