Data Warehouse
Data Warehouse
Data Warehouse
Wachira Davis
DATA WAREHOUSE
To provide comprehensive analysis of the organization, its business, its requirements, and any
trends, requires access to not only the current values in the database but also to historical data.
To facilitate this type of analysis, the data warehouse has been created to hold data drawn from
several data sources, maintained by different operating units, together with historical and
summary transformations.
A data warehouse stores data that have been extracted from the various operational and
external or other databases of an organization.
It is a central source of data that have been cleaned, transformed, and cataloged so they can be
used by managers and other business professionals for data mining, online analytical processing,
and other forms of business analysis, market research, and decision support.
Data warehouses may be subdivided into data marts, which hold subsets of data from the
warehouse that focus on specific aspects of a company, such as a department or a business
process.
The data warehouse based on extended database technology provides the management of the
data store.
However, decision-makers also require powerful analysis tools.
Two main types of analysis tools have emerged over the last few years:
o Online Analytical Processing (OLAP) and data mining tools.
Wachira Davis
The original concept of a data warehouse was devised by IBM as the information warehouse and
presented as a solution for accessing data held in non-relational systems.
The information warehouse was proposed to allow organizations to use their data archives to
help them gain a business advantage.
However, due to the sheer complexity and performance problems associated with the
implementation of such solutions, the early attempts at creating an information warehouse
were mostly rejected. Since then, the concept of data warehousing has been raised several
times but it is only in recent years that the potential of data warehousing is now seen as a
valuable and viable solution.
The latest and most successful advocate for data warehousing is Bill Inmon, who has earned the
title of 'father of data warehousing' due to his active promotion of the concept.
Data warehousing is a subject-oriented, integrated, time-variant, and non-volatile collection of
data in support of management's decision-making process.
Wachira Davis
The Web is an immense source of behavioral data as individuals interact through their Web
browsers with remote Web sites.
The data generated by this behavior is called clickstream.
Using a data warehouse on the Web to harness click stream data has led to the development of
Data Webhouses.
Potential high returns on investment: An organization must commit a huge amount of resources
to ensure the successful implementation of a data warehouse and the cost can vary enormously
from due to the variety of technical solutions available
Competitive advantage: The huge returns on investment for those companies that have
successfully implemented a data warehouse is evidence of the enormous competitive advantage
that accompanies this technology. The competitive advantage is gained by allowing decision-
makers access to data that can reveal previously unavailable, unknown, and untapped
information on, for example, customers, trends, and demands.
An organization will normally have a number of different OLTP systems for business processes
such as inventory control, customer invoicing, and point-of-sale. These systems generate
operational data that is detailed, current, and subject to change. The OLTP systems are
optimized for a high number of transactions that are predictable, repetitive, and update
intensive. The OLTP data is organized according to the requirements of the transactions
associated with the business applications and supports the day-to-day decisions of a large
number of concurrent operational users.
In contrast, an organization will normally have a single data warehouse, which holds data that is
historical, detailed, and summarized to various levels and rarely subject to change (other than
3
Data warehouse
Wachira Davis
being supplemented with new data). The data warehouse is designed to support relatively low
numbers of transactions that are unpredictable in nature and require answers to queries that
are ad hoc, unstructured, and heuristic. The warehouse data is organized according to the
requirements of potential queries and supports the long-term strategic decisions of relatively
low number of managerial users.
Although OLTP systems and data warehouses have different characteristics and are built with
different purposes in mind, these systems are closely related in that the OLTP systems provide
the source data for the warehouse. A major problem of this relationship is that the data held by
the OLTP systems can be inconsistent, fragmented, and subject to change, containing duplicate
or missing entries. As such, the operational data must be “cleaned up” before they can be used
in the data warehouses.
OLTP are not built to answer ad hoc queries. They also tend not to store historical data, which is
necessary to analyse trends – also offers large amount of raw data – not easily analysable. With
data warehouses, more queries can be answered. An example is what are the three most
popular areas in each city for the renting of property in 2004 and how does this compare with
the results for previous two years.
Wachira Davis
investing in easier-to-use, more powerful tools, or in providing better training for the users. A
further reason for increasing demands on IS staff is that once a data warehouse is online, it is
often the case that the number of users and queries increase together with requests for answers
to more and more complex queries.
Data homogenization
Large-scale data warehousing can become an exercise in data homogenization that lessens the
value of the data. For example, in producing a consolidated and integrated view of the
organization's data, the warehouse designer may be tempted to emphasize similarities rather
than differences in the data used by different application areas such as property sales and
property renting.
Data ownership
Data warehousing may change the attitude of end-users to the ownership of data. Sensitive data
that was originally viewed and used only by a particular department or business area, such as
sales or marketing, may now be made accessible to others in the organization.
High maintenance
Data warehouses are high maintenance systems. Any reorganization of the business processes
and the source systems may affect the data warehouse. To remain a valuable resource, the data
warehouse must remain consistent with the organization that it supports.
Long-duration projects
A data warehouse represents a single data resource for the organization. However, the building
of a warehouse can take up to three years, which is why some organizations are building data
marts.
Complexity of integration
The most important area for the management of a- data warehouse is the integration
capabilities. This means an organization must spend a significant amount of time determining
how well the various different data warehousing tools can be integrated into the overall solution
that is needed. This can be a very difficult task, as there are a number of tools for every
operation of the data warehouse, which must integrate well in order that the warehouse works
to the organization's benefit.
DATA MART
It is a subset of a data warehouse that supports the requirements of a particular department or
business function.
A data mart holds a subset of the data in a data warehouse normally in the form of summary
data relating to a particular department or business function.
The data mart can be standalone or linked centrally to the corporate data warehouse.
5
Data warehouse
Wachira Davis
As a data warehouse grows larger, the ability to serve the various needs of the organization may
be compromised. The popularity of data marts stems from the fact that corporate-wide data
warehouses are proving difficult to build and use.
The characteristics that differentiate data marts and data warehouses include:
a data mart focuses on only the requirements of users associated with one department
or business function;
data marts do not normally contain detailed operational data, unlike data warehouses;
as data marts contain less data compared with data warehouses, data marts are more
easily understood and navigated.
If you try to use the Web to place an order or view a product catalog, you probably could be
using a Web site linked to an internal corporate database. Many companies now use the Web to
make some of the information in their internal databases available to customers and business
partners.
Suppose, for example, a customer with a Web browser wants to search an online retailer's
database for pricing information. The user accesses the retailer's Web site over the Internet
using Web browser software on his or her client PC. The user's Web browser software requests
data from the organization's database, using HTML commands to communicate with
the Web server.
Because many "back-end" databases cannot interpret commands written in HTML, the Web
server passes these requests for data to software that translates HTML commands into SQL so
6
Data warehouse
Wachira Davis
that they can be processed by the DBMS working with the database. In a client/server
environment, the DBMS resides on a dedicated computer called a database server. The DBMS
receives the SQL requests and provides the required data. The middle ware transfers
information from the organization's internal database back to the Web server for delivery in
the form of a Web page to the user.
The middleware working between the Web server and the DBMS could be an application server
running on its own dedicated computer. The application server software handles all application
operations, including transaction processing and data access, between browser-based
computers and a company's back-end business applications or databases. The application server
takes requests from the Web server, runs the business logic to process transactions based on
those requests, and provides connectivity to the organization's back-end systems or databases.
Alternatively, the software for handling these operations could be a custom program or a CGI
script. A CGI script is a compact program using the Common Gateway Interface (CGI)
specification for processing data on a Web server.
There are a number of advantages to using the Web to access an organization's internal
databases. First, Web browser software is much easier to use than proprietary query tools.
Second, the Web interface requires few or no changes to the internal database. It costs much
less to add a Web interface in front of a legacy system than to redesign and rebuild the system
to improve user access.
Accessing corporate databases through the Web is creating new efficiencies, opportunities, and
business models. Can you think of an example in this case of a firm?