0% found this document useful (0 votes)
14 views22 pages

Unit 1

Download as pptx, pdf, or txt
0% found this document useful (0 votes)
14 views22 pages

Unit 1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 22

102046706 – Data Mining &

Business Intelligence

Unit-1
Overview and concepts
Data Warehousing and
Business Intelligence
Outline
 Why Reporting & Analyzing Data?
 Introduction to Business Intelligence
 Introduction to Data Warehousing
 Features of Data Warehousing
 Introduction to Data marts
 Types of Data Marts
 Meta Data
Why Reporting & Analyzing Data?
 The amount of data stored in databases is growing exponentially &
databases are now measured in gigabytes(GBs) and terabytes(TBs).
 However row data does not provide useful information.
 In today’s highly competitive business environment, companies need
to turn these terabytes of raw data into some useful information.
 The general methods of analysis/reporting can be broadly classified
into two categories: non-parametric analysis & parametric analysis
 Example
• Managers will generally be more interested in actual data and non-parametric
analysis results, while engineers will be more concerned with parametric
analysis.
What is Business Intelligence?
 BI technologies provide historical, current and predictive views of
business operations.
 Common functions of business intelligence technologies include
reporting, online analytical processing, analytics, data mining,
process mining, business performance management, text mining,
predictive analytics and prescriptive analytics.
 BI technologies can handle large amounts of structured and
sometimes unstructured data to help business & also identify,
develop new strategic business opportunities.
 Identifying new opportunities and implementing an effective
strategy based on insights can provide businesses with a
competitive market advantage and long-term stability.
Business Intelligence (Cont..)
 Business intelligence (BI) make up the strategies and technologies
used by enterprises for the data analysis of business information.
 BI tools access and analyze data sets and present analytical
findings in reports, summaries, dashboards, graphs, charts and
maps to provide users with detailed intelligence about the state
of the business.
 Typical BI infrastructure components are as follows:
• Software solution for gathering, cleansing, integrating, analyzing and
sharing data.
 It produces analysis and provides believable information to help
making effective and high quality business decisions.
Business Intelligence (Cont..)
 The most common kinds of business intelligence systems are:
• MIS - Management Information Systems
• CRM - Customer Relationship Management
• EIS - Executive Information Systems
• DSS - Decision Support Systems
• GIS - Geographic Information Systems
• OLAP - Online Analytical Processing
Introduction to Data Warehouse
 Collections of databases that work together are called data
warehouses.
 This makes it possible to integrate data from multiple databases
& it is used to help individuals and organizations make better
decisions.
 A database consists of one or more files that need to be stored on
a computer.
 In large organizations, databases are typically not stored on the
individual computers of employees but in a central system
(server).
Data Warehouse (Cont..)
 According to William H. Inmon, a leading architect in the
construction of data warehouse systems, “A data warehouse is a
subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision making
process”.
 Features of Data Warehousing
• Subject-oriented
• Integrated
• Time-variant
• Nonvolatile
Features of Data Warehouse
 Subject-oriented:
• A data warehouse is organized around major subjects, such as customer,
supplier, product, and sales.
• Rather than concentrating on the day-to-day operations and transaction
processing of an organization, a data warehouse focuses on the modeling
and analysis of data for decision makers.
• Data warehouses typically provide a simple and concise view around
particular subject issues by excluding data that are not useful in the
decision support process.
Features of Data Warehouse (Cont..)
 Integrated:
• A data warehouse is usually constructed by integrating multiple
heterogeneous sources, such as relational databases, flat files, and on-line
transaction records.
• Data cleaning and data integration techniques are applied to ensure
consistency in naming conventions, encoding structures, attribute
measures, and so on.
 Time-variant:
• Data are stored to provide information from a historical perspective (e.g., the
past 5–10 years).
• Every key structure in the data warehouse contains, either implicitly or
explicitly, an element of time.
Features of Data Warehouse (Cont..)
 Nonvolatile:
• A data warehouse is always a physically separate store of data transformed from
the application data found in the operational environment.
• Due to this separation, a data warehouse does not require transaction
processing, recovery, and concurrency control mechanisms.
• It usually requires only two operations in data accessing: initial loading of data
and access of data.
Data Warehouse Design Process
 A data warehouse can be built using a top-down approach, a bottom-up
approach, or a combination of both.
 Top Down Approach
• The top-down approach starts with the overall design and planning.
• It is useful in cases where the technology is mature and well known, and where the
business problems that must be solved are clear and well understood.
 Bottom up Approach
• The bottom-up approach starts with experiments and prototypes.
• This is useful in the early stage of business modeling and technology development.
• It allows an organization to move forward at considerably less expense and to evaluate
the benefits of the technology before making significant commitments.
 Combined Approach
• In the combined approach, an organization can exploit the planned and strategic
nature of the top-down approach while retaining the rapid implementation and
opportunistic application of the bottom-up approach.
Types of Data Warehouse
 The three main types of data warehouses are:
• Enterprise Data Warehouse
• Operational Data Store
• Data Mart
Data Warehouse Types (Cont..)
 Enterprise Data Warehouse:
• Enterprise Data Warehouse is a centralized warehouse, which provides decision
support service across the enterprise.
• It offers a unified approach to organizing and representing data.
• It also provides the ability to classify data according to the subject and give access
according to those divisions.
 Operational Data Store:
• Operational Data Store, also called ODS, is data store required when neither data
warehouse nor OLTP systems support organizations reporting needs.
• It is widely preferred for routine activities like storing records..
• In ODS, Data warehouse is refreshed in real time.
 Data Mart:
• A Data Mart is a subset of the data warehouse.
• It specially designed for specific segments like sales, finance, sales, or finance.
• In an independent data mart, data can collect directly from sources.
Introduction to Data Marts
 A data mart contains a subset of corporate-wide data that is of
value to a specific group of users.
 The scope is confined to specific selected subjects.
 For example, a marketing data mart may confine its subjects to
customer, item, and sales.
 The data contained in data marts tend to be summarized.
Introduction to Data Marts (Cont..)
 Depending on the source of data, data marts can be categorized
as independent or dependent.
 Independent data marts are sourced from data captured from one
or more operational systems or external information providers, or
from data generated locally within a particular department or
geographic area.
 Dependent data marts are sourced directly from enterprise data
warehouses.
Virtual warehouse
 A virtual warehouse is a set of views over operational databases.
 For efficient query processing, only some of the possible summary
views may be materialized.
 A virtual warehouse is easy to build but requires excess capacity
on operational database servers.
Data Warehouse v/s Data Mart
 Data warehouse:
• Holds multiple subject areas
• Holds very detailed information
• Works to integrate all data sources
• Size (typical) 100 GB-TB+
• Implementation Time : Months to Years
 Data mart:
• Often holds only one subject area- for example, Finance, or Sales
• May hold more summarized data
• Concentrates on integrating information from a given subject area or set of
source systems
• Size (typical) < 100GB
• Implementation Time : Months
Meta data
 Metadata are data about data.
 When meta data is used in a data warehouse, that defines
warehouse objects.
 Metadata are created for the data names and definitions of the
given warehouse.
 Additional metadata are created and captured for time stamping
any extracted data, the source of the extracted data, and missing
fields that have been added by data cleaning or integration
processes.
Meta data
 A metadata repository should contain the following:
 A description of the data warehouse structure, which includes the
warehouse schema, view, dimensions, hierarchies, and derived data
definitions, as well as data mart locations and contents.
 Operational metadata, which include data lineage (history of
migrated data and the sequence of transformations applied to it),
currency of data (active, archived, or purged), and monitoring
information (warehouse usage statistics, error reports, and audit
trails).
 The algorithms used for summarization, which include measure and
dimension definition algorithms, data on granularity, partitions,
subject areas, aggregation, summarization, and predefined queries
and reports.
Meta data
 Mapping from the operational environment to the data
warehouse, which includes databases and their contents, gateway
descriptions, data partitions, data extraction, cleaning,
transformation rules and defaults, data refresh and purging rules,
and security (user authorization and access control).
 Data related to system performance, which include indices and
profiles that improve data access and retrieval performance, in
addition to rules for the timing and scheduling of refresh, update,
and replication cycles.
 Business metadata, which include business terms and definitions,
data ownership information, and charging policies.

You might also like