0% found this document useful (0 votes)
22 views28 pages

Introduction To DW

Download as pptx, pdf, or txt
0% found this document useful (0 votes)
22 views28 pages

Introduction To DW

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 28

Ma’am Mich

What is a Data Warehouse?


1. A Data Warehouse is a group of data specific to
the entire organization, not only to a particular
group of users.
2. It is not used for daily operations and transaction
processing but used for making decisions.
A Data Warehouse can be viewed as a data system with the following
attributes:
● It is a database designed for investigative tasks, using data from various
applications.
● It supports a relatively small number of clients with relatively long
interactions.
● It includes current and historical data to provide a historical perspective of
information.
● Its usage is read-intensive.
● It contains a few large tables.
● "Data Warehouse is a subject-oriented, integrated, and time-variant store of
information in support of management's decisions."
Characteristics of Data Warehouse
Subject-Oriented
● A data warehouse target on the modeling and analysis of data
for decision-makers. Therefore, data warehouses typically
provide a concise and straightforward view around a particular
subject, such as customer, product, or sales, instead of the
global organization's ongoing operations. This is done by
excluding data that are not useful concerning the subject and
including all data needed by the users to understand the subject.
Integrated
A data warehouse integrates various heterogeneous data sources like RDBMS, flat files, and online transaction records. It requires performing data cleaning and integration during data warehousing to ensure consistency in naming conventions, attributes types, etc., among different data
sources.
Time-Variant
Historical information is kept in a data warehouse.
For example, one can retrieve files from 3 months,
6 months, 12 months, or even previous data from a
data warehouse. These variations with a
transactions system, where often only the most
current file is kept.
Non-Volatile
● The data warehouse is a physically separate data storage, which is
transformed from the source operational RDBMS. The operational
updates of data do not occur in the data warehouse, i.e., update,
insert, and delete operations are not performed. It usually requires
only two procedures in data accessing: Initial loading of data and
access to data. Therefore, the DW does not require transaction
processing, recovery, and concurrency capabilities, which allows for
substantial speedup of data retrieval. Non-Volatile defines that once
entered into the warehouse, and data should not change.
History of Data Warehouse
1. The idea of data warehousing came to the late 1980's when IBM researchers
Barry Devlin and Paul Murphy established the "Business Data Warehouse."
2. In essence, the data warehousing idea was planned to support an
architectural model for the flow of information from the operational system
to decisional support environments. The concept attempt to address the
various problems associated with the flow, mainly the high costs associated
with it.
3. In the absence of data warehousing architecture, a vast amount of space was
required to support multiple decision support environments. In large
corporations, it was ordinary for various decision support environments to
operate independently.
Goals of Data Warehousing
• Help reporting as well as analysis
• Maintain the organization's historical information
• Be the foundation for decision making.
Need for Data Warehouse
Data Warehouse is needed for the following reasons:
1) Business User: Business users require a data warehouse to view
summarized data from the past. Since these people are non-technical,
the data may be presented to them in an elementary form.

2) Store historical data: Data Warehouse is required to store the time


variable data from the past. This input is made to be used for various
purposes.
3) Make strategic decisions: Some strategies may be depending upon the
data in the data warehouse. So, data warehouse contributes to making
strategic decisions.

4) For data consistency and quality: Bringing the data from different
sources at a commonplace, the user can effectively undertake to bring the
uniformity and consistency in data.

5) High response time: Data warehouse has to be ready for somewhat


unexpected loads and types of queries, which demands a significant degree
of flexibility and quick response time.
Benefits of Data Warehouse
1. Understand business trends and make better forecasting decisions.
2. Data Warehouses are designed to perform well enormous amounts of data.
3. The structure of data warehouses is more accessible for end-users to navigate,
understand, and query.
4. Queries that would be complex in many normalized databases could be easier
to build and maintain in data warehouses.
5. Data warehousing is an efficient method to manage demand for lots of
information from lots of users.
6. Data warehousing provide the capabilities to analyze a large amount of
historical data.
A data warehouse is a single data repository where a record from multiple
data sources is integrated for online business analytical processing (OLAP). This
implies a data warehouse needs to meet the requirements from all the business stages
within the entire organization. Thus, data warehouse design is a hugely complex,
lengthy, and hence error-prone process. Furthermore, business analytical functions
change over time, which results in changes in the requirements for the systems.
Therefore, data warehouse and OLAP systems are dynamic, and the design process is
continuous.
Data warehouse design takes a method different from view materialization
in the industries. It sees data warehouses as database systems with particular needs
such as answering management related queries. The target of the design becomes how
the record from multiple data sources should be extracted, transformed, and loaded
(ETL) to be organized in a database as the data warehouse.
There are two Approaches:

 "top-down" approach
 "bottom-up" approach
Top-down Design Approach
In the "Top-Down" design approach, a data warehouse is described as a
subject-oriented, time-variant, non-volatile and integrated data repository for the
entire enterprise data from different sources are validated, reformatted and saved in
a normalized (up to 3NF) database as the data warehouse. The data warehouse
stores "atomic" information, the data at the lowest level of granularity, from where
dimensional data marts can be built by selecting the data required for specific
business subjects or particular departments. An approach is a data-driven approach
as the information is gathered and integrated first and then business requirements by
subjects for building data marts are formulated. The advantage of this method is
which it supports a single integrated data source. Thus data marts built from it will
have consistency when they overlap.
Advantages of top-down design

• Data Marts are loaded from the data warehouses.


• Developing new data mart from the data warehouse is
very easy.

Disadvantages of top-down design

• This technique is inflexible to changing departmental


needs.
• The cost of implementing the project is high.
Bottom-Up Design Approach
In the "Bottom-Up" approach, a data warehouse is described as "a
copy of transaction data specifical architecture for query and analysis,"
term the star schema. In this approach, a data mart is created first to
necessary reporting and analytical capabilities for particular business
processes (or subjects). Thus it is needed to be a business-driven approach
in contrast to Inmon's data-driven approach.
The advantage of the "bottom-up" design approach is that it has
quick ROI, as developing a data mart, a data warehouse for a single subject,
takes far less time and effort than developing an enterprise-wide data
warehouse. Also, the risk of failure is even less. This method is inherently
incremental. This method allows the project team to learn and grow .
Advantages of Bottom-up Design
 Documents can be generated quickly.
 The data warehouse can be extended to accommodate new
business units.
 It is just developing new data marts and then integrating with other
data marts.

Disadvantages of Bottom-up Design


 The locations of the data warehouse and the data marts are reversed in the bottom-up
approach design
Differentiate between Top-Down Design Approach
and Bottom-Up Design Approach

You might also like