Lecture 9
Lecture 9
Lecture 9
Lecture 9
Chapter 20
WHERE WE
ARE NOW 7. 8. Financial
9. Tax
Management Statement
Analytics
Analytics Analytics
3-2
Identify the
questions
going to look at
Performing the Communica
te insights
Perform
test plan
3-3
Chapter Objectives
• Understand four categories of Data Analytics.
• Describe some descriptive analytics approaches, including
summary statistics and data reduction.
• Explain the diagnostic approach to Data Analytics, including
profiling and clustering.
• Understand predictive analytics, including regression and
classification.
• Describe the use of prescriptive analytics, including machine
learning and artificial intelligence.
© McGraw Hill 4
Learning Objective 3-1
© McGraw Hill 5
There are four main categories of data
analytics.
• Descriptive analytics are • Predictive analytics are
procedures that summarize procedures used to
existing data to determine generate a model that can
what has happened in the be used to determine what
past. is likely to happen in the
• Diagnostic analytics are future.
procedures that explore the • Prescriptive analytics are
current data to determine procedures that model data
why something has to enable recommendations
happened the way it has, for what should be done in
typically comparing the data the future.
to a benchmark.
© McGraw Hill 6
Each stage takes additional effort but
provides additional value.
© McGraw Hill 7
Descriptive analytics examples:
• Summary statistics • Data reduction or filtering
describe a set of data in is used to reduce the
terms of their location amount of observations to
(mean, median), range focus on relevant items (that
(standard deviation, is, highest cost, highest risk,
minimum, maximum), shape largest impact, etc.). It does
(quartile), and size (count). this by taking a large set of
data (perhaps the
population) and reducing it
to a smaller set that has the
vast majority of the critical
information of the larger set.
© McGraw Hill 8
Diagnostic analytics examples:
• Profiling identifies the • Clustering helps identify
“typical” behavior of an groups (or clusters) of
individual, group, or individuals (such as
population by compiling customers) that share
summary statistics about the common underlying
data (including mean, characteristics—in other
standard deviations, etc.) words, identifying groups of
and comparing individuals to similar data elements and
the population. the underlying drivers of
those groups.
© McGraw Hill 9
More diagnostic analytics examples:
• Similarity matching is a • Co-occurrence grouping
grouping technique used to discovers associations
identify similar individuals between individuals based
based on data known about on common events, such as
them. transactions they are
involved in.
© McGraw Hill 10
Predictive analytics examples:
• Regression estimates or • Classification predicts a
predicts the numerical value class or category for a new
of a dependent variable observation based on the
based on the slope and manual identification of
intersect of a line and the classes from previous
value of an independent observations.
variable.
© McGraw Hill 11
More predictive analytics examples:
• Link prediction predicts a relationship between two data items,
such as members of a social media platform.
© McGraw Hill 12
Prescriptive analytics examples:
• Decision support • Machine learning and
systems are rule-based artificial intelligence are
systems that gather data learning models or intelligent
and recommend actions agents that adapt to new
based on the input. external data to recommend
a course of action.
© McGraw Hill 13
Learning Objective 3-2
© McGraw Hill 14
Descriptive analytics help summarize
what has happened in the past.
• A financial accountant would • An auditor would filter
sum all the sales data to limit the scope to
transactions within a period transactions that represent
to calculate the value for the highest risk. In all these
Sales Revenue that appears cases, basic analysis
on the income statement. provides an understanding
• An analyst would count the of what has happened in the
number of records in a data past to help decision makers
extract to ensure the data achieve good results and
are complete before running correct poor results.
a more complex analysis.
© McGraw Hill 15
Summary statistics
• Summary Statistic Excel formula Description
Sum SUM() The total value of all numerical values
statistics Mean =AVERAGE()
The center value; sum of all observations divided by the
describe the number of observations
The middle value that divides the top half of the data from the
location, Median =MEDIAN()
bottom half
Minimum =MIN() The smallest value
spread, Maximum =MAX() The largest value
shape, and Count =COUNT() The number of observations
© McGraw Hill 17
Fuzzy matching locates approximate
matches
• Useful for
identifying
relationships in
imperfect data.
© McGraw Hill 18
Q. Describe how the data reduction
approach could be used to evaluate
employee travel and entertainment
expenses.
© McGraw Hill 19
Learning Objective 3-3
© McGraw Hill 20
Diagnostic analytics
• Diagnostic analytics provide insight into why things happened or
how individual data values relate to the general population.
© McGraw Hill 21
Profiling compares an individual to the
population
• Profiling is done primarily using structured data—data that are
stored in a database or spreadsheet and are readily
searchable.
• Profiling is used to discover patterns of behavior. In this
example, the higher the Z-score (farther away from the mean),
the more likely a customer will have a delayed shipment
(blue circle).
© McGraw Hill 22
Profiling relies on gathering summary
statistics and identifying outliers.
• Identify the objects or activity you want to profile.
• Determine the types of profiling you want to perform.
• Set boundaries or thresholds for the activity.
• Interpret the results and monitor the activity and/or generate a
list of exceptions.
• Follow up on exceptions.
© McGraw Hill 23
Z-Scores and box plots show spread
and outliers.
Access the text alternative for slide images. Exhibit 3-12 Cluster Analysis of Insurance Payments
© McGraw Hill 28
Hypothesis testing is used to identify
how different groups are.
• Begin by setting the
Null Hypothesis H0 (no
relationship) and the
Alternative Hypothesis
HA (expected
relationship).
• Test the p-value for
statistical significance.
© McGraw Hill 32
The goal of classification is to predict
which class an individual belongs to.
• Identify the classes you wish to predict.
• Manually classify an existing set of records.
• Select a set of classification models.
• Divide your data into training and testing sets.
• Generate your model.
• Interpret the results and select the “best” model.
© McGraw Hill 33
Classification begins with decision
boundaries.
• Training data are existing
data that have been
manually evaluated and
assigned a class.
• Test data are existing data
used to evaluate the
model.
• Decision trees are used to
divide data into smaller
groups.
• Decision boundaries mark
the split between one
class and another.
Exhibit 3-16 Example of Decision Trees and Decision
Boundaries
Access the text alternative for slide images.
© McGraw Hill 34
What else do you need to know about
classification? 2
Pruning removes branches
from a decision tree to avoid
overfitting the model.
• Access the text alternative for slide images. Exhibit 3-17 Illustration of Pruning a Decision Tree
© McGraw Hill 35
What else do you need to know about
classification? 3
• Linear classifiers are useful
for ranking items rather than
simply predicting class
probability.
• These are useful for
determining the important
values, such as valuable
customers, or which
transactions are most likely
fraudulent.
Exhibit 3-13 Illustration of Linear Classifiers
Access the text alternative for slide images.
© McGraw Hill 36
What else do you need to know about
classification? 4
Exhibit 3-14 Support Vector Machines
• Support vector machine is a Exhibit 3-15 Support Vector
discriminating classifier that Machine Decision Boundaries
is defined by a separating
hyperplane that works first
to find the widest margin (or
biggest pipe) and then
works to find the middle line.
© McGraw Hill 40
Learning Objective 3-5
© McGraw Hill 42
Decision support systems use rules to
guide the accountant.
• The rules are derived from
past behavior to help guide
the accountant through a
process.
• For example, the
classification of leases is
based on evaluating several
rules.
© McGraw Hill 44
Chapter 3 Summary
In this chapter, we addressed the third and fourth steps We introduced some specific models and terminology
of the IMPACT cycle model: the “P” for “performing test related to these tools, including Benford’s law, test and
plan” and “A” for “address and refine results.” That is, training data, decision trees and boundaries, linear
how are we going to test or analyze the data to address classifiers, and support vector machines. We identified
a problem we are facing? (LO 3-1) cases where creating models that overfit existing data
are not very accurate at predicting the future. (LO 3-4)
We identified descriptive analytics that help describe
what happened with the data, including summary We explained examples of predictive analytics and
statistics, data reduction, and filtering. (LO 3-2) introduced some data mining concepts related to
regression, classification, and link prediction that can
We provided examples of diagnostic analytics that help help predict future events or values. (LO 3-4)
users identify relationships in the data that uncover why
certain events happen through profiling, We discussed prescriptive analytics, including decision
clustering,similarity matching, and co-occurrence support systems and artificial intelligence and provided
grouping. (LO 3-3) some examples of how these systems can make
recommendations for future actions. (LO 3-5)
© McGraw Hill 45
Homework
• Chapter 3 homework:
• DQ2; DQ3; DQ8; DQ9; DQ10;
• P2, P4, P7