Business Analytics Module 2 Summary

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Business Analytics

Module 2 Sampling and Estimation


• It is often very useful to infer attributes of a large population from a smaller
sample. To make sound inferences:
o Make sure the sample is sufficiently large and is representative of the
population.
o Avoid biased results by
§ phrasing questions neutrally;
§ ensuring that the sampling method is appropriate for the
demographic of the target population; and
§ pursuing high response rates.
o If a sample is sufficiently large and representative of the population, the
sample statistics, x and s, should be reasonably good estimates of the
population parameters, µ and s, respectively.
• The normal distribution has a unique symmetrical shape whose center and
width are determined by its mean and standard deviation respectively.
o Using the properties of the normal distribution, we can calculate a
probability associated with any range of values.
o Several rules of thumb are helpful for estimating probabilities for a
normal distribution.
§ About 68% of the probability is contained in the range reaching one
standard deviation away from the mean on either side, that is,
P(µ - s £ x £ µ + s) » 68%.
§ About 95% of the probability is contained in the range reaching two
standard deviations (1.96 to be exact) away from the mean on
either side, that is, P(µ - 2s £ x £ µ +2s) » 95%.
§ About 99.7% of the probability is contained in the range reaching
three standard deviations away from the mean on either side, that
is, P(µ - 3s £ x £ µ + 3s) » 99.7%.
o A z-value of a point x is the distance x lies from the mean, measured in
standard deviations, .
• The Central Limit Theorem states that if we take enough sufficiently large
samples from any population, the means of those samples will be normally
distributed, regardless of the shape of the underlying population.
o The distribution of those sample’s means, called the Distribution of
Sample Means, more closely approximates a normal curve as we
increase the number of samples and/or the sample size.

© Copyright 2020 President and Fellows of Harvard College. All Rights Reserved.
o The mean of any single sample lies on the normally distributed Distribution
of Sample Means, so we can use the normal curve’s special properties to
draw conclusions from a single sample mean.
o The mean of the Distribution of Sample Means equals the mean of the
population distribution.
o The standard deviation of the Distribution of Sample Means equals the
standard deviation of the population distribution divided by the square root
of the sample size. Thus, increasing the sample size decreases the width
of the Distribution of Sample Means.
• The sample mean is only a point estimate. Using the properties of the normal
distribution and the Central Limit Theorem, we can construct a range around the
sample mean, called a confidence interval, to estimate the range in which the
true population mean likely lies.
o The width of the confidence interval depends on the level of confidence,
our best estimate of the population standard deviation, and the sample
size. We can only control the level of confidence and the sample size.
o For large samples (n ≥ 30), the lower and upper bounds are calculated
using the following equation: .
o The function CONFIDENCE.NORM calculates the margin of error, which
we add and subtract from the sample mean to find the confidence interval.
o For small samples (n < 30), the lower and upper bounds are calculated
using the following equation: .
§ For small samples, we use a t-distribution, which is shorter and
wider than a normal distribution. The t-distribution provides a wider
range, a more conservative estimate of where the true population
mean lies.
§ The function CONFIDENCE.T calculates the margin of error, which
we add and subtract from the sample mean to find the confidence
interval.
• We can also calculate confidence intervals for proportions. To do so, we must
convert data to dummy (0, 1) variables.
o After that, we can proceed as we would with any other confidence interval.
§ When estimating the true population proportion, we should ensure
that the sample size is large enough by checking that both of the
following conditions are true: n * p ≥ 5, and n(1 − p) ≥ 5. If either of
these guidelines is not satisfied, we must collect a larger sample.

© Copyright 2020 President and Fellows of Harvard College. All Rights Reserved. 2
EXCEL SUMMARY

Recall the Excel functions and analyses covered in this course and make sure to
familiarize yourself with all of the necessary steps, syntax, and arguments. We have
provided some additional information for the more complex functions listed below. As
usual, the arguments shown in square brackets are optional. The functions whose
names include “S” use the standard normal distribution.

• =RAND()
• =NORM.DIST(x, mean, standard_dev, cumulative)
o When cumulative is set to “TRUE”, NORM.DIST finds the cumulative
probability, that is, the probability of being less than or equal to the
specified value x, for a normal distribution with the specified mean and
standard deviation. (Inserting the value “FALSE” provides the height of the
normal distribution at the value x, which is not covered in this course.)
• =NORM.S.DIST(z, cumulative)
o When cumulative is set to “TRUE”, NORM.S.DIST finds the cumulative
probability, that is, the probability of being less than or equal to the
specified value z for a standard normal distribution.
• =NORM.INV(probability, mean, standard_dev)
o Returns the corresponding x-value on a normal distribution for the
specified mean, standard deviation, and cumulative probability.
• =CONFIDENCE.NORM(alpha, standard_dev, size)
o Returns the margin of error using a normal distribution for a specified
alpha, standard_dev, and size. Alpha is the significance level, which
equals one minus the confidence level (for example, a 95% confidence
interval would correspond to the significance level 0.05).
• =CONFIDENCE.T(alpha, standard_dev, size)
o Returns the margin of error using a t-distribution for a specified alpha,
standard_dev, and size.
• =IF(logical_test, [value_if_true], [value_if_false])
o Returns value_if_true if the specified condition is met, and returns
value_if_false if the condition is not met.

© Copyright 2020 President and Fellows of Harvard College. All Rights Reserved. 3

You might also like