Lecture OLAP & Operation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Data Mining and Warehousing

Course Code IT3240

Subject Guide: Dr. Sumit Dhariwal


February 1, 2023 Data Mining and Warehousing (IT3240): Sec D 2
Online Analytical Processing
(OLAP)
& it’s Operations

February 1, 2023 Data Mining and Warehousing (IT3240): Sec D 3


3
Course Objective

• Be familiar with mathematical foundations of data mining tools..


• Understand and implement classical models and algorithms in data
warehouses and data mining
• Characterize the kinds of patterns that can be discovered by
association rule mining, classification and clustering.
• Master data mining techniques in various applications like social,
scientific and environmental context.
• Develop skill in selecting the appropriate data mining algorithm for
solving practical problems.

February 1, 2023 Data Mining and Warehousing (IT3240): Sec D 4


Course Outcomes

• Understand the functionality of the various data mining and data


warehousing component
• Appreciate the strengths and limitations of various data mining and
data warehousing models
• Explain the analysing techniques of various data
• Describe different methodologies used in data mining and data ware
housing.
• Compare different approaches of data ware housing and data mining
with various technologies.

February 1, 2023 Data Mining and Warehousing (IT3240): Sec D 5


Syllabus

• Data warehousing: Introduction to Data Warehouse, Statistical Observation on Data, Data Types, DBMS
Schemas for Decision Support, Data Mart, Data Extraction, Transformation and Load (ETL) Operations,
Metadata; Online Analytical Processing (OLAP), Online Transaction Processing (OLTP), ROLAP,
MOLAP, HOLAP and their Operations, Bitmap Indexing, Join Indexing, Attribute Selection Measure,
BUC Cubing Method, Data Cubing, Star Tree Construction, Inverted Index.

• Data Mining: Introduction Data Mining & Applications, Types of Data, Pre-Processing, KDD Process.

• Association Rule Mining (ARM): Interestingness of Patterns, Mining Frequent Patterns, K-Frequent
Item Set Mining, A-Priori Algorithm, Associations and Correlations Mining, Correlation Analysis,
Constraint Based Association Mining.

• Classification and Prediction: Basic Concepts, Entropy, Decision Tree, Naïve Bayes Algorithm, Neural
Networks, Back Propagation, Support Vector Machines, Associative Classification, Lazy Learners,
Prediction.

• Clustering: Basic Concepts, Cluster Analysis, K-Means, Partitioning Methods, Hierarchical Clustering,
Expectation Maximization, Density based Clustering, Web Mining, Text Mining, Spatial Mining.

• Case Study: Case Studies on Various Data Mining Techniques with Varying Data Sets.

February 1, 2023 Data Mining and Warehousing (IT3240): Sec D 6


Reference Books:
1. J. Han and M. Kambher, Data Mining Concepts and Techniques, (3e), Elsevier,
2007.
2. P. N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, (1e),
Person Education India, 2007.
3. A. Berson and S. J. Smith, Data Warehousing, Data Mining & OLAP, (10e),
Tata McGraw – Hill, 2007.
Assessment Pattern:

• At least two assignment


• At least Three quizzes
• Two Mid Term Examinations
• One End Term Examination
• Quiz can be conducted before announcement
• In final assessment
• Assignment- 10
• Quiz – 10

February 1, 2023 Data Mining and Warehousing (IT3240): Sec D 8


Content
• Introduction to OLAP

• History of OLAP

• OLAP Cube

• Difference Between OLAP & OLTP

• OLAP Operations

• Benefits of OLAP
Introduction

• OLAP (online analytical processing) is computer processing that


enables a user to easily and selectively extract and view data from
different points of view.
• OLAP allows users to analyze database information from multiple
database systems at one time.
History of OLAP

• In 1993, E. F. Codd came up with the term OLAP and defined an


OLAP database
• The first product that performed OLAP queries was Express, which
was released in 1970 (and acquired by Oracle in 1995 from
Information Resources).
Introduction

• OLAP databases contain two basic types of data:


• measures, which are numeric data, the quantities and averages that
you use to make informed business decisions, (For Ex- Avg. Sale) and
• dimensions, which are the categories that you use to organize these
measures. For Ex- Time, place etc.
• OLAP databases help organize data by many levels of detail, using the
same categories that you are familiar with to analyze the data.
The Complete Decision Support
System
Information Sources Data Warehouse OLAP Servers Clients
Server (Tier 2) (Tier 3)
(Tier 1)
e.g., MOLAP
Semistructured Analysis
Sources
Data
Warehouse serve

extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DB’s Data Mining
serve

Data Marts
CS 336 14
OLTP vs. OLAP
OLAP Cube

• An OLAP Cube is a data structure that allows fast analysis of data.


• The arrangement of data into cubes overcomes a limitation of
relational databases.
• The OLAP cube consists of numeric facts called measures which are
categorized by dimensions.
OLAP Cube
OLAP Operations
OLAP Operations

• There are different kind of operations which we can perform in OLAP


• Roll up
• Drill Down
• Slice
• Dice
• Pivot
• Drill-across
• Drill-through
Typical OLAP Operations

• Roll up (drill-up): summarize data


• by climbing up hierarchy or by dimension reduction
• Drill down (roll down): reverse of roll-up
• from higher level summary to lower level summary or detailed
data, or introducing new dimensions
• Slice and dice: project and select
• Pivot (rotate):
• reorient the cube, visualization, 3D to series of 2D planes
• Other operations
• drill across: involving (across) more than one fact table
• drill through: through the bottom level of the cube to its back-end
relational tables (using SQL)
Roll-up

• Takes the current aggregation level of fact values and does a further
aggregation on one or more of the dimensions.
• Equivalent to doing GROUP BY to this dimension by using attribute
hierarchy.
• SELECT [attribute list], SUM [attribute names] FROM [table list]
WHERE [condition list] GROUP BY [grouping list]
Example

• Roll up on Location from cities to


countries.

• More detailed data to less detailed


data.
Before Roll up

After Roll up
Example of Roll up
Drill- down

• Drill-down is the reverse of roll-up.


• That means lower-level summary to higher level summary.
• Increases a number of dimensions - adds new headers
• Drill-down can be performed either by
• Stepping down a concept hierarchy for a dimension
• By introducing a new dimension.
Example

After Drill Down

Before Drill down


Example of Drill down
Slice

• Performs a selection on one dimension of the given cube.


• Sets one or more dimensions to specific values and keeps a subset of
dimensions for selected values.
Example
• Here Slice is performed for the
dimension "time" using the
criterion time = "Q1".
Dice

• Define a sub-cube by performing a selection of one or more


dimensions.
• Refers to range select condition on one dimension, or to select
condition on more than one dimension.
• Reduces the number of member values of one or more dimensions.
Example
• The dice operation on the cube based on
the following selection criteria involves
three dimensions.
• (location = "Toronto" or "Vancouver")
• (time = "Q1" or "Q2")
• (item =" Mobile" or "Modem")
Example of Slice & dice
Pivot

• Rotates the data axis to view the data from different perspectives.
• Groups data with different dimensions
Pivot

• Pivot is also known as rotate.


• It Rotates the data axis to view the data from different perspectives.

Example
Drill across & Drill through

• Drill-across : Accesses more than one fact table that is linked by


common dimensions. Combines cubes that share one or more
dimensions.
• Drill-through: Drill down to the bottom level of a data cube down to
its back-end relational tables.
Drill Across
Drill Through
Exercise
Exercise

1. Compute the sales Quantity by Country


2. Try to understand why sales of seafood in Q1 is higher than the
other products.
3. Try to understand why sales of seafood in January was higher
much
4. Visualize the cube with time dimension in X axis
5. Visualize data only for paris
6. Visualize data only for paris or Lyon and Quarters Q1 or Q2
Solution
Solution
Other OLAP Operations

• Sort
• Sort brings the cube back where the members of a dimension were sorted.
• Add Measure
• This OLAP operation one is able to add new measures to a cube.
• Drop Measure
• In contrast to Add Measure, it’s also possible to get rid of a measure from a
data cube if it's not necessary.
• Union
• Due to an opportunity of Union, you can unite a number of cubes which have
the same scheme but separate instances.
• Difference
• Difference eliminates the cells in a cube which are owned by another one.
These two cubes must possess the same scheme.
Union
Add and Drop Measure
Sort
Benefits

• OLAP offers four key benefits:


• Business-focused multidimensional data
• Business-focused calculations
• Trustworthy data and calculations
• Speed-of-thought analysis
References

• Book- “Fundamentals of Business Analysis”- Seema Acharya


• http://www.skybuffer.com/blog/1/
• https://en.wikipedia.org/wiki/Online_analytical_processing
• http://searchdatamanagement.techtarget.com/definition/OLAP
• http://olap.com/olap-definition/
• https://support.office.com/en-my/article/Overview-of-Online-
Analytical-Processing-OLAP-15d2cdde-f70b-4277-b009-
ed732b75fdd6

You might also like