Empirical Probability Distribution

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Fundamentals Of Business Analytics 1

Empirical Probability Distribution

Module 006 Empirical Probability Distribution


At the end of this module, you will be able to:
1. Understand the concept of Empirical Probability Distribution
2. Learn what is Discrete Empirical Probability Distribution
3. Understand discuss what the Continuous Empirical Probability
Distribution is.

INTRODUCTION

The objective of this module is to demonstrate how to convert data into probabilities to solve
managerial decisions. Real historical (empirical) data does not necessarily fit a known
distribution, however these data frequencies and rankings can be used to estimate the
appropriate empirical probability distribution. Later, the empirical distributions are used in
decision trees and simulations to make optimum managerial decisions.

Empirical probability uses the number of occurrences of an outcome within a sample set as a
basis for determining the probability of that outcome. The number of times "event X" happens
out of 100 trials will be the probability of event X happening. An empirical probability is closely
related to the relative frequency of an event. An empirical distribution is one for which each
possible event is assigned a probability derived from experimental observation. It is assumed
that the events are independent and the sum of the probabilities is 1.

Empirical probability, also called experimental probability, is the probability your experiment
will give you a certain result. For example, you could toss a coin 100 times to see how many
heads you get, or you could perform a taste test to see if 100 people preferred cola A or cola B.
You could use this information to make an educated guess (a statistic) about what your
probabilities would be if you performed the experiments 1000, 10,000 or even an unlimited
number of times. If you don’t actually perform the experiment—if you just theorize about it—
then that’s called theoretical probability.

Empirical Probability is probability based upon data. That data can be either the result of a
designed experiment (experimental data) or the result of situations that occur beyond the
control of the analyst (observational data). In the fields of medicine and business, data-driven
probability is referred to as “Evidence-based” probability.

Course Module
In order for a theory to be proved or disproved, empirical evidence must be collected. An
empirical study will be performed using actual market data. In finance for example, many
empirical studies have been conducted on the capital asset pricing model (CAPM), and the results
are slightly mixed.

In some analyses, the model does hold in real world situations, but most studies have disproved
the model for projecting returns. Although the model is not completely valid, that is not to say
that there is no utility associated with using the CAPM. For instance, the CAPM is often used to
estimate a company's weighted average cost of capital.

An empirical distribution may represent either a continuous or a discrete distribution. If it


represents a discrete distribution, then sampling is done “on step”. If it represents a continuous
distribution, then sampling is done via “interpolation”. The way the table is described usually
determines if an empirical distribution is to be handled discretely or continuously.

discrete description continuous description


value probability value probability
-
10 .10 0-10 .10
-
20 .15 10 - 20 .15
-
35 .40 20 – 35 .40
-
40 .30 35 – 40 .30
-
60 .05 40 -60 .05

Table: 1 Table: 2

DISCRETE EMPIRICAL PROBABILITY DISTRIBUTIONS

Recall that f(x) is the probability of a specific outcome x, that is, the probability of a specific value
of a random variable. Discrete empirical probability can be calculated by counting the number of
occurrences of each outcome (numeric or otherwise):

f(x) = P(x) = n(x)


n

where n(x) is the number of data points equal to the value x and n is the total number of data
points (sample size).

f(x) = Probability of an event x = No. of times the event x happened


No. of times the event could have happened
Fundamentals Of Business Analytics 3
Empirical Probability Distribution

EXAMPLE: DISCRETE EMPIRICAL PROBABILITY DISTRIBUTION


A human resources analyst is examining the potential financial implications of employees
choosing retirement plans. The company has four retirement plans: A, B, C, and D. A sample of 25
employees and the plans they have selected is shown in Table 3. Construct the distribution for
the choice of retirement plans.

Obs Plan Obs Plan Obs Plan


1 B 11 C 21 C
2 C 12 D 22 C
3 C 13 B 23 B
4 C 14 A 24 B
5 C 15 C 25 C
6 C 16 D
7 D 17 C
8 B 18 C
9 D 19 A
10 B 20 D

Table 3: Sample Data for Retirement Plan Selection

It is useful to first sort the data. The frequencies and probabilities are readily computed after
sorting as in the figure below.

Figure 1: Example of Discrete Empirical Probability Distribution

Course Module
As would be expected due to the Law of Large Numbers, the accuracy of this method of
determining discrete probability improves for larger samples.
Examining the previous spreadsheet reveals that there are two methods by which a set of
empirical data may be used to generate random variables:

1. Using the full list of data: Give each element a 1/n probability of selection. The data can be
first sorted. Sorted data provides the analyst a better understanding of the likelihoods of the
various outcomes, this in turn, provides the analyst with a much better understanding of the
data.
2. Using the data distribution: This works well when there are not an overly cumbersome
number of levels of the discrete variable.

The spreadsheet shown in Fig. 2 demonstrates how the discrete example could be simulated
using the full list of data.

Figure 2: Simulation of Discrete Data Using Full List of Data

The spreadsheet shown in Fig. 3 is an example how the discrete empirical data could be
simulated using the probability distribution of the data computed from the previous example.
Fundamentals Of Business Analytics 5
Empirical Probability Distribution

Figure 3: Simulation of Discrete Data Using the Probability Distribution of the Data

The second method is fundamentally the same as the first, but takes advantage of the way
VLOOKUP works when using an approximate match for data in which the data key (first column
of the data) is sorted from smallest to largest. Compare the two methods mentioned previously
to note that the data distribution method is the full list of data method with the repeated
outcomes removed.

CONTINUOUS EMPIRICAL PROBABILITY DISTRIBUTIONS

With continuous empirical data f(x) can be calculated using the cumulative distribution function
(cdf), F(x). When calculating probabilities from historical data, F(x) is called the Empirical
Cumulative Distribution Function and is abbreviated as ECDF(x). The ECDF(x) is easily calculated
by first sorting the data from smallest to largest and then using the frequency counts to
determine the cumulative probability:

ECDF(x) = F(x) = P(X ≤ x)/n


where n(X ≤ x) is the number of data points less than or equal to the value x and n is the total
number of data points (sample size).

Course Module
EXAMPLE: EMPIRICAL DISTRIBUTION IN A DECISION TREE: PRICING DECISIONS

A company is bidding to supply parts to an electronics manufacturer. The competitors’ bids for
10 previous similar contracts are shown in Table 4. If the bid is won, the total cost of completing
the contract is $350,000. What is the optimum bid?

Obs Bid Obs Bid Obs Bid


1 369,800 5 387,300 9 401,400
2 403,200 6 404,800 10 380,300
3 401,800 7 389,700
4 387,600 8 407,700

Table 4: Empirical Probability Distribution for Bidding Example

Consider the abbreviated generic bidding decision tree in Fig. 4.

Figure 4: Abbreviated Generic Decision Tree for Bidding Example

As the electronics manufacturer will purchase the least expensive components, then low bid wins
in this situation. Because low bid wins, then the probability of winning given a specific bid is:

P(Win ∣ Bid) = 1 – f (Bid) = 1 - ECDF


Based on the decision tree, the expected value for a given bid is calculated:

EVBid = P(Win ∣ Bid)(Bid - $350,000)+(1- P(Win ∣ Bid))($0)


= P(Win ∣ Bid)(Bid - $350,000)
To compute P(Win | Bid), first calculate the ECDF. The ECDF is computed by first sorting the data
from largest to smallest, then calculating the number of data points less than or equal to each
data point, and finally dividing those results by the sample size:

ECDF(x) = F(x) = P(X ≤ x)


= n(X ≤ x)/n
The ECDF is shown in figure 5.
Fundamentals Of Business Analytics 7
Empirical Probability Distribution

Figure 5: Empirical Cumulative Distribution Function (ECDF) for the Bidding Example

The probabilities of winning is then calculated as P(Win | Bid) = 1 – ECDF. Thus, for LOW BID
WINS bidding, the probability of winning is 1 − CDF. Conversely, for HIGH BID WINS bidding the
probability of winning is the CDF.

Course Module
Figure 6: Probability of Winning Given a Specific Bid (1 − ECDF) for the Bidding Example

From the ECDF, the slopes and intercepts to calculate the probability of winning given a specific
bid using interpolation can be calculated using the method shown in Table 5.

Intercept =
P(Win | Slope =
Rank Obs ECDF Bid ECDF− Slope ×
Bid) ∆P(Win|Bid)/ ∆Bid
Bid
1 1 0.10 369,800 0.90 −0.0000095 4.42
2 10 0.20 380,300 0.80 −0.0000143 6.23
3 5 0.30 387,300 0.70 −0.0003333 129.80
4 4 0.40 387,600 0.60 −0.0000476 19.06
5 7 0.50 389,700 0.50 −0.0000085 3.83
6 9 0.60 401,400 0.40 −0.0002500 100.75
7 3 0.70 401,800 0.30 −0.0000714 29.00
8 2 0.80 403,200 0.20 −0.0000625 25.40
9 6 0.90 404,800 0.10 −0.0000345 14.06
10 8 1.00 407,700 0.00 −0.0000345 14.06
Table 5: Slope–Intercept Table to Calculate P(Win | Bid) = 1 − ECDF

Using Table 5 and the VLOOKUP function, the expected value for a bid, EVBid, can be calculated
EVBid = P(Win ∣ Bid)(Bid - $350,000)
= (Slope(Bid)+ Intercept)(Bid - $350,000)
Fundamentals Of Business Analytics 9
Empirical Probability Distribution

The optimum bid is obtained using Excel’s One-Way Data Table command.

Figure 7: Calculation of Optimum Bid Using a One-Way Table

In a manner similar to the method used to simulate the five-point estimate, the ECDF must first
be inverted as shown in Table 6 and corresponding graph shown in figure 8.

Slope =
ECDF = Intercept = Bid −
Rank Bid ∆Bid/
Rand() Slope * ECDF
∆ECDF
1 0.00 359,300 105,000 359,300
2 0.10 369,800 105,000 359,300
3 0.20 380,300 70,000 366,300
4 0.30 387,300 3,000 386,400
5 0.40 387,600 21,000 379,200
6 0.50 389,700 117,000 331,200
7 0.60 401,400 4,000 399,000
8 0.70 401,800 14,000 392,000
9 0.80 403,200 16,000 390,400
10 0.90 404,800 29,000 378,700
1.00 407,700 0 407,700
Table 6: Slope–Intercept Table to Generate Random Bids for Simulation

Course Module
Figure 8: Inverse ECDF for Generating Random Bids for Simulation

As is the case of simulating the five-point estimate, the RAND() must be calculated in a cell that is
external to the cell used to compute the random variable so that the slope and intercept will
correspond to the appropriate percentile specified by the RAND(). Then the VLOOKUP function is
used to determine the appropriate slope and intercept to calculate the random bid that would
correspond to the percentile generated by the RAND().

ADVANTAGES AND DISADVANTAGES

The main advantage of using empirical probability is that the probability is backed by
experimental studies and data. It is free from assumed data or hypotheses. However, there are
two big disadvantages of empirical probability to consider:

• Drawing incorrect conclusions


Using empirical probability can cause wrong conclusions to be drawn. For example, we know
that the chance of getting a head from a coin toss is ½. However, an individual may toss a coin
three times and get heads in all tosses. He may draw an incorrect conclusion that the chances of
tossing a head from a coin toss are 100%.

• Insufficient sample size


Small sample sizes reduce accuracy. Therefore, large sample sizes are generally used for
empirical probability to attain a good probability representation. For example, if an individual
wanted to know the probability of getting a head in a coin toss but only used one sample, the
empirical probability would be either 0% or 100%.
Fundamentals Of Business Analytics 11
Empirical Probability Distribution

Books and Journals

Pinder, J. (2017). Introduction to Business Analytics Using Simulation, 125 London Wall, London EC2Y
5AS, United Kingdom

Schniederjans, M. (2017), Business Analytics Principles, Concepts, and Applications, Pearson Education,
Inc, Upper Saddle River, New Jersey 07458

https://www.investopedia.com/
https://www.managementstudyguide.com/
https://www.statisticshowto.com/experimental-empirical-probability/

Course Module

You might also like