Data Warehouse

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 7

Data warehouse

Wachira Davis

DATA WAREHOUSE
 To provide comprehensive analysis of the organization, its business, its requirements, and any
trends, requires access to not only the current values in the database but also to historical data.
 To facilitate this type of analysis, the data warehouse has been created to hold data drawn from
several data sources, maintained by different operating units, together with historical and
summary transformations.
 A data warehouse stores data that have been extracted from the various operational and
external or other databases of an organization.
 It is a central source of data that have been cleaned, transformed, and cataloged so they can be
used by managers and other business professionals for data mining, online analytical processing,
and other forms of business analysis, market research, and decision support.
 Data warehouses may be subdivided into data marts, which hold subsets of data from the
warehouse that focus on specific aspects of a company, such as a department or a business
process.
 The data warehouse based on extended database technology provides the management of the
data store.
 However, decision-makers also require powerful analysis tools.
 Two main types of analysis tools have emerged over the last few years:
o Online Analytical Processing (OLAP) and data mining tools.

The Evolution of Data Warehousing


 Since the 1970s, organizations have mostly focused their investment in new computer systems
that automate business processes.
 In this way, organizations gained competitive advantage through systems that offered more
efficient and cost-effective services to the customer.
 Throughout this period, organizations accumulated growing amounts of data stored in their
operational databases.
 However, in recent times, where such systems are commonplace, organizations are focusing on
ways to use operational data to support decision-making, as a means of regaining competitive
advantage.
 Operational systems were never designed to support such business activities and so using these
systems for decision-making may never be an easy solution.
 The legacy is that a typical organization may have numerous operational systems with
overlapping and sometimes contradictory definitions, such as data types.
 The challenge for an organization is to turn its archives of data into a source of knowledge, so
that a single integrated/consolidated view of the organization's data is presented to the user.
 The concept of a data warehouse was deemed the solution to meet the requirements of a
system capable of supporting decision-making, receiving data from multiple operational data
sources.

Data Warehousing Concepts


1
Data warehouse

Wachira Davis

 The original concept of a data warehouse was devised by IBM as the information warehouse and
presented as a solution for accessing data held in non-relational systems.
 The information warehouse was proposed to allow organizations to use their data archives to
help them gain a business advantage.
 However, due to the sheer complexity and performance problems associated with the
implementation of such solutions, the early attempts at creating an information warehouse
were mostly rejected. Since then, the concept of data warehousing has been raised several
times but it is only in recent years that the potential of data warehousing is now seen as a
valuable and viable solution.
 The latest and most successful advocate for data warehousing is Bill Inmon, who has earned the
title of 'father of data warehousing' due to his active promotion of the concept.
 Data warehousing is a subject-oriented, integrated, time-variant, and non-volatile collection of
data in support of management's decision-making process.

In this definition by Inmon the data is:


 Subject-oriented as the warehouse is organized around the major subjects of the enterprise
(such as customers, products, and sales) rather than the major application areas (such as
customer invoicing, stock control, and product sales). This is reflected in the need to store
decision-support data rather than application-oriented data.
 Integrated because of the coming together of source data from different enterprise-wide
applications systems. The source data is often inconsistent using, for example, different
formats. The integrated data source must be made consistent to present a unified view of
the data to the users.
 Time-variant because data in the warehouse is only accurate and valid at some point in time
or over some time interval. The time-variance of the data warehouse is also shown in the
extended time that the data is held, the implicit or explicit association of time with all data,
and the fact that the data represents a series of snapshots.
 Non-volatile as the data is not updated in real time but is refreshed from operational
systems on a regular basis. New data is always added as a supplement to the database,
rather than a replacement. The database continually absorbs this new data, incrementally
integrating it with the previous data.
 There are several definitions and the scope of the definition of data warehousing to include
the processing associated with accessing the data from the original sources to the delivery of
the data to the decision- makers.
 Whatever the definition, the ultimate goal of data warehousing is to integrate enterprise-
wide corporate data into a single repository from which users can easily run queries,
produce reports, and perform analysis. In summary, a data warehouse is data management
and data analysis technology.
 In recent years a new term associated with data warehousing has been used, namely Data
Webhouse.
 Data Webhouse is a distributed data warehouse that is implemented over the Web with no
central data repository.
2
Data warehouse

Wachira Davis

 The Web is an immense source of behavioral data as individuals interact through their Web
browsers with remote Web sites.
 The data generated by this behavior is called clickstream.
 Using a data warehouse on the Web to harness click stream data has led to the development of
Data Webhouses.

Benefits of Data Warehousing


 The successful implementation of a data warehouse can bring major benefits to an organization
including:

 Potential high returns on investment: An organization must commit a huge amount of resources
to ensure the successful implementation of a data warehouse and the cost can vary enormously
from due to the variety of technical solutions available

 Competitive advantage: The huge returns on investment for those companies that have
successfully implemented a data warehouse is evidence of the enormous competitive advantage
that accompanies this technology. The competitive advantage is gained by allowing decision-
makers access to data that can reveal previously unavailable, unknown, and untapped
information on, for example, customers, trends, and demands.

 Increased productivity of corporate decision-makers: Data warehousing improves


the productivity of corporate decision-makers by creating an integrated database of consistent,
subject-oriented, historical data. It integrates data from multiple incompatible systems into a
form that provides one consistent view of the organization. By transforming data into
meaningful information, a data warehouse allows corporate decision-makers to perform more
substantive, accurate, and consistent analysis.

Comparison of OLTP Systems and Data Warehousing


 A DBMS built for Online Transaction Processing (OLTP) is generally regarded as unsuitable for
data warehousing because each system is designed with a differing set of requirements in mind.
For example, OLTP systems are designed to maximize the transaction processing capacity, while
data warehouses are designed to support ad hoc query processing.

 An organization will normally have a number of different OLTP systems for business processes
such as inventory control, customer invoicing, and point-of-sale. These systems generate
operational data that is detailed, current, and subject to change. The OLTP systems are
optimized for a high number of transactions that are predictable, repetitive, and update
intensive. The OLTP data is organized according to the requirements of the transactions
associated with the business applications and supports the day-to-day decisions of a large
number of concurrent operational users.
 In contrast, an organization will normally have a single data warehouse, which holds data that is
historical, detailed, and summarized to various levels and rarely subject to change (other than

3
Data warehouse

Wachira Davis

being supplemented with new data). The data warehouse is designed to support relatively low
numbers of transactions that are unpredictable in nature and require answers to queries that
are ad hoc, unstructured, and heuristic. The warehouse data is organized according to the
requirements of potential queries and supports the long-term strategic decisions of relatively
low number of managerial users.

 Although OLTP systems and data warehouses have different characteristics and are built with
different purposes in mind, these systems are closely related in that the OLTP systems provide
the source data for the warehouse. A major problem of this relationship is that the data held by
the OLTP systems can be inconsistent, fragmented, and subject to change, containing duplicate
or missing entries. As such, the operational data must be “cleaned up” before they can be used
in the data warehouses.

 OLTP are not built to answer ad hoc queries. They also tend not to store historical data, which is
necessary to analyse trends – also offers large amount of raw data – not easily analysable. With
data warehouses, more queries can be answered. An example is what are the three most
popular areas in each city for the renting of property in 2004 and how does this compare with
the results for previous two years.

Problems of data warehouses

 Underestimation of resources for data loading


Many developers underestimate the time required to extract, clean, and load the data into the
warehouse. This process may account for a significant proportion of the total development time,
although better data cleansing and management tools should ultimately reduce the time and
effort spent.

 Hidden problems with source systems


Hidden problems associated with the source systems feeding the data warehouse may be
identified, possibly after years of being undetected. The developer must decide whether to fix
the problem in the data warehouse and/or fix the source systems. For example, when entering
the details of a new property, certain fields may allow nulls, which may result in staff entering
incomplete property data, even when available and applicable.

 Required data not captured


Warehouse projects often highlight a requirement for data not being captured by the existing
source systems. The organization must decide whether to modify the OLTP systems or create a
system dedicated to capturing the missing data.

 Increased end-user demands


After end-users receive query and reporting tools, requests for support from IS staff may
increase rather than decrease. This is caused by an increasing awareness of the users on the
capabilities and value of the data warehouse. This problem can be partially alleviated by
4
Data warehouse

Wachira Davis

investing in easier-to-use, more powerful tools, or in providing better training for the users. A
further reason for increasing demands on IS staff is that once a data warehouse is online, it is
often the case that the number of users and queries increase together with requests for answers
to more and more complex queries.

 Data homogenization
Large-scale data warehousing can become an exercise in data homogenization that lessens the
value of the data. For example, in producing a consolidated and integrated view of the
organization's data, the warehouse designer may be tempted to emphasize similarities rather
than differences in the data used by different application areas such as property sales and
property renting.

 High demand for resources


The data warehouse can use large amounts of disk space.

 Data ownership
Data warehousing may change the attitude of end-users to the ownership of data. Sensitive data
that was originally viewed and used only by a particular department or business area, such as
sales or marketing, may now be made accessible to others in the organization.

 High maintenance
Data warehouses are high maintenance systems. Any reorganization of the business processes
and the source systems may affect the data warehouse. To remain a valuable resource, the data
warehouse must remain consistent with the organization that it supports.

 Long-duration projects
A data warehouse represents a single data resource for the organization. However, the building
of a warehouse can take up to three years, which is why some organizations are building data
marts.

 Complexity of integration
The most important area for the management of a- data warehouse is the integration
capabilities. This means an organization must spend a significant amount of time determining
how well the various different data warehousing tools can be integrated into the overall solution
that is needed. This can be a very difficult task, as there are a number of tools for every
operation of the data warehouse, which must integrate well in order that the warehouse works
to the organization's benefit.
DATA MART
 It is a subset of a data warehouse that supports the requirements of a particular department or
business function.
 A data mart holds a subset of the data in a data warehouse normally in the form of summary
data relating to a particular department or business function.
 The data mart can be standalone or linked centrally to the corporate data warehouse.
5
Data warehouse

Wachira Davis

 As a data warehouse grows larger, the ability to serve the various needs of the organization may
be compromised. The popularity of data marts stems from the fact that corporate-wide data
warehouses are proving difficult to build and use.

The characteristics that differentiate data marts and data warehouses include:
 a data mart focuses on only the requirements of users associated with one department
or business function;
 data marts do not normally contain detailed operational data, unlike data warehouses;
 as data marts contain less data compared with data warehouses, data marts are more
easily understood and navigated.

Reasons for Creating a Data Mart


There are many reasons for creating a data mart, which include:
 To give users access to the data they need to analyze most often.
 To provide data in a form that matches the collective view of the data by a group of
users in a department or business function.
 To improve end-user response time due to the reduction in the volume of data to be
accessed.
 To provide appropriately structured data as dictated by the requirements of end-user
access tools such as Online Analytical Processing (OLAP) and data mining tools, which
may require their own internal database structures. In practice, these tools often create
their own data mart designed to support their specific functionality.
 Data marts normally use less data so tasks such as data cleansing, loading, transformation, and
integration are far easier, and hence implementing and setting up a data mart is simpler than
establishing a corporate data warehouse.
 The cost of implementing data marts is normally less than that required to establish a
data warehouse.
 The potential users of a data mart are more clearly defined and can be more easily targeted
to obtain support for a data mart project rather than a corporate data warehouse project.
Databases and the web

 If you try to use the Web to place an order or view a product catalog, you probably could be
using a Web site linked to an internal corporate database. Many companies now use the Web to
make some of the information in their internal databases available to customers and business
partners.

 Suppose, for example, a customer with a Web browser wants to search an online retailer's
database for pricing information. The user accesses the retailer's Web site over the Internet
using Web browser software on his or her client PC. The user's Web browser software requests
data from the organization's database, using HTML commands to communicate with
the Web server.

 Because many "back-end" databases cannot interpret commands written in HTML, the Web
server passes these requests for data to software that translates HTML commands into SQL so
6
Data warehouse

Wachira Davis

that they can be processed by the DBMS working with the database. In a client/server
environment, the DBMS resides on a dedicated computer called a database server. The DBMS
receives the SQL requests and provides the required data. The middle ware transfers
information from the organization's internal database back to the Web server for delivery in
the form of a Web page to the user.

 The middleware working between the Web server and the DBMS could be an application server
running on its own dedicated computer. The application server software handles all application
operations, including transaction processing and data access, between browser-based
computers and a company's back-end business applications or databases. The application server
takes requests from the Web server, runs the business logic to process transactions based on
those requests, and provides connectivity to the organization's back-end systems or databases.
Alternatively, the software for handling these operations could be a custom program or a CGI
script. A CGI script is a compact program using the Common Gateway Interface (CGI)
specification for processing data on a Web server.

 There are a number of advantages to using the Web to access an organization's internal
databases. First, Web browser software is much easier to use than proprietary query tools.
Second, the Web interface requires few or no changes to the internal database. It costs much
less to add a Web interface in front of a legacy system than to redesign and rebuild the system
to improve user access.

 Accessing corporate databases through the Web is creating new efficiencies, opportunities, and
business models. Can you think of an example in this case of a firm?

You might also like