Chapter 2 Final of Final

Download as pdf or txt
Download as pdf or txt
You are on page 1of 158

Statistics needed for SPC: graphics,

distributions, tests of hypothesis


Chapter 2
Frequency Distribution &
Histogram

Describing Numerical Summary of


Variation Data

Probability Distribution
• Statistics is a collection of techniques useful for
making decisions about a process or population
based on an analysis of the information
Statistics contained in a sample from that population.
• Statistical methods play a vital role in quality
control and improvement.
Two kinds of statistics
• Descriptive and inferential
• No two units of product produced by
DESCRIBING a process are identical. Some
variation is inevitable. As
VARIATION examples, the net content of a can
of soft drink varies slightly from can
to can, and the output
voltage of a power supply is not
exactly the same from one unit to
the next.
• Statistics is the science of analyzing
data and drawing conclusions, taking
variation in the data into account
• The Stem-and-Leaf Plot
DESCRIBING • The Histogram
• Numerical Summary of Data
VARIATION • The Box Plot
• Probability Distributions
• Suppose that the data are represented by x1,
x2, . . . , xn and that each number xi consists
of at least two digits.
• To construct a stem-and-leaf plot, we divide
each number xi into two
1. The Stem- parts: a stem, consisting of one or more of the
leading digits; and a leaf, consisting of the
and-Leaf Plot remaining digits.
• For example, if the data consist of percent
defective information between 0
and 100 on lots of semiconductor wafers, then
we can divide the value 76 into the stem 7 and
the leaf 6.
Example
• 15, 27, 8, 17, 13, 22, 24, 25, 13, 36, 32, 32, 32, 28, 43, 7
• Step 1 – Arrange in ascending order
0th – 7, 8

10th – 13, 13, 15, 17


20th - 22, 24, 25, 27, 28
30th – 32, 32, 32, 36
4oth – 43
• Use 25, 32, 46, 11, 51, 39, 45, 32, 17, 30
and draw the stem and leaf and also
calculate the following
A. What is the range of this data (range =

Exercise highest value – minimum value)


B. What is the mean of the data (average)
C. What is the median of the data
D. What is the mode of the data?
Stem Leaf
1 17
• A. What is the range of this
2 5 data (range = highest value –
minimum value) 51 – 11 = 40
3 0229 • What is the mean of the
data (average) = 32.8
4 56 • What is the median of the
data
5 1 • 11, 17, 25, 30, 32, 32, 39, 45,
46, 51 = 64/2 = 32
• What is the mode of the
data? The most frequent = 32
25, 32, 46, 11, 51, 39, 45,
32, 17, 30
Exercise

Draw a stem and leaf diagram for the following data

Question A. 72, 85, 89, 93, 88, 76, 108, 115, 97, 102, 113

Question B. 1.2, 2.3, 1.5, 2.4, 3.6, 1.8, 2.7, 3.2, 4.1, 2.9, 4.5, 7.6, 5.8,
9.3, 10.6, 12.4, 10.9
2. Histogram
➢ The histogram is a bar chart showing a distribution of
variables. (or) A graphical presentation of grouped
frequency distribution of variables.
➢ consisting of a series of adjacent rectangles whose bases
are the class intervals specified in terms of
❑ class boundaries (equal to the class width of the
corresponding classes) shown on the x-axis and
❑ whose heights are proportional to the corresponding class
frequenc0ies shown on the y-axis.
Histogram
➢It can help suggest both the nature, and possible improvements
for the physical mechanisms/ quality characteristics in the
process.

A histogram is
designed to show:

the numbers of groups of values, so the average as well


a distribution or times various that patterns of as
spread of data measurement values variation are easily variability/variation
occur identifiable of a data set
• Element of Histogram:
• Horizontal axis: lists measurement values.
• Vertical axis: shows frequency or amount
of values.
• Width of each bar: represents an arbitrary
range of values.
Histogram • Height of each bar: represents the number
of times the values within specified range
are observed.
• Pattern created by the bar heights: displays
a graphic distribution.
Step-1 calculate the range → Range = Highest value-Lowest value

Step-2. Calculate Number of Class (K), if not given and using Sturges
Formula → K= 1+ 3.3 log N, Where N is the Number of Observation

Step-3. Calculate the class Width (CW) → CW = R/K (Rounding up, if


necessary)

Step 4: Determine the class boundaries: Determine the boundaries


of the intervals so that they include the smallest and the largest of
values and write these down on the frequency table.

Step 5:Prepare the frequency table Prepare a table in which the


class, midpoint, frequency marks, etc., can be recorded.
Example 2
The following data represent the record high temperatures in degrees
Fahrenheit (F) for each of the 50 states. Construct a grouped frequency
distribution for the data and plot the histogram with its means and standard
deviations.
Step-1 calculate the range

Range = Highest value-Lowest value

= 134-100=34

Step-2. Calculate Number of Class (K), if not given and using Sturges
Formula
K= 1+ 3.3 log N =1+3.3*log 50 =6.61 , the number of class can be taken as 7

Step-3. Calculate the class Width (CW)

CW = R/K = 34/7 = 4.9, CW = 5, (Rounding up)


Class Limit Class Class Mid Point Tally Frequency (f)
boundaries (Xm)
100 -104 99.5 – 104.5 102 // 2
105 -109 104.5 – 109.5 107 8
///// ///
110 - 114 109.5 – 114.5 112 ///// ///// 18
///// ///

115 - 119 114.5 – 119.5 117 ///// ///// /// 13


120 - 124 119.5 – 124.5 122 ///// // 7
125 - 129 124.5 – 129.5 127 / 1
130 - 134 129.5 – 134.5 132 / 1
N=∑f 50
20 114
F
r
e
q 15
u
e
n 10
c
y
5

0
99.5 104.5 109.5 114.5 119.5 124.5 129.5 134.5
Values (degree Fahrenheit)
• The stem-and-leaf plot and the histogram
provide a visual display of three
properties of sample data:
3. Numerical 1 → the shape of the distribution of the data,
Summary of 2 → the central tendency in the data, and
3 → the scatter or variability in the data.
Data It is also helpful to use numerical measures
of central tendency and scatter.
Average & Variance
• Suppose that x1, x2, . . . , xn are the observations
in a sample. The most important measure of
central tendency in the sample is the sample
average. the sample average represents the
center of mass of the sample data.

The variability in the sample data is measured by the


sample variance,
• If there is no variability in the sample,
then each sample observation 𝑥𝑖 = 𝑥ҧ and
Average & the sample variance 𝑠 2 = 0.
• Generally, the larger is the sample
Variance variance 𝑠 2 , the greater is the variability
in the sample data.
Average & Variance

• The units of the sample variance s2 are the


square of the original units of the data. This is
often inconvenient and awkward to interpret,
and so we usually prefer to use the square root
of 𝑠 2 , called the sample standard deviation
s, as a measure of variability.
Average &
Variance
Average & Variance
The Box Plot

• The box plot is a graphical display that


simultaneously displays several important features
of the data, such as location or central tendency,
spread or variability, departure from symmetry,
and identification of observations that lie
unusually far from the bulk of the data (these
observations are often called “outliers”).
Box plot displays five data points
→ The three quartiles
→ The minimum, and the maximum
of the data on a rectangular box,
aligned either horizontally or
vertically.
The Box Plot
The Box Plot
11, 22, 20, 14, 29,8,35,27,13,49,10,24,17
• Step 1 – Arrange the data in ascending order
8, 10, 11, 13, 14, 17, 20, 22, 24, 27, 29, 35, 49
- Step 2, calculate median of the given data set =20
- Step3, find the median of the lower half of the data =
8, 10, 11, 13, 14, 17 = (11+13)/2 = 12 = Q1
- Step 4; find the median of the upper half of the data
22, 24, 27, 29, 35, 49 = (27+29)/2 = 28 = Q3
The Box Plot
• Step 5; check whether the min and max values are outliers or not
• Step 6, calculate the range of numbers the outliers could be.
• = [lower quartile (Q1) -1.5IQR, Q3 +1.5IQR] where IQR is the
interquartile range. Any value outside of this range and a member
of the data set is an outlier and can be rejected
• IQR = Q3 – Q1 = 28-12 = 16
= 12 -1.5(16), 28 +1.5(16) → [-12, 52] → MAX and MIN are within the
IQR then there is no outlier in the data, thus the Boxplot can be
drawn.
The Box Plot
Class Work

• Draw a Box plot for the following data


• 18, 34,76,29,15,41,46,25,54,38,20,32,43,22
Home Work

• Draw a Box plot for the following data


• 18, 34,76,29,15,41,46,25,54,38,20,32,43,22
Ascending order
15, 18, 20, 22, 25, 29, 32, 34, 38, 41, 43, 46, 54, 76
→ Median = (32+34)/2 = 33 , which divides both
32, 34 equally
→ Min = 15
→ Max = 76
→ Q1 = median of 15, 18, 20, 22, 25, 29, 32 = 22
→ Q3 = Median of 34, 38, 41, 43, 46, 54, 76 = 43
→ Interquartile range (IQR) = Q3 –Q1 = 43 -22 = 21
Home Work
• Outlier checker
[lower quartile (Q1) -1.5IQR, Q3 +1.5IQR]
[22 -1.5*21, 43+1.5*21]
=[-9.5, 74.5]
Ascending order
15, 18, 20, 22, 25, 29, 32, 34, 38, 41, 43, 46, 54, 76, which 76 is
outlier
Home Work
PROBABILITY DISTRIBUTIONS
Probability and Statistics
Probability is the chance of an outcome in an experiment (also called event).

Event: Tossing a fair coin


Outcome: Head, Tail

Probability deals with predicting the Statistics involves the analysis of the
likelihood of future events. frequency of past events

Example: Consider there is a drawer containing 100 socks: 30 red, 20 blue and
50 black socks.
We can use probability to answer questions about the selection of a
random sample of these socks.
• PQ1. What is the probability that we draw two blue socks or two red socks from
the drawer?
Two Kinds of
Distributions
Probability Distributions

• The appearance of a discrete distribution is that of a series of vertical


“spikes,” with the height of each spike proportional to the probability. We
write the probability that the random variable x takes on the specific
value xi as

• The appearance of a continuous distribution is that of a smooth curve,


with the area under the curve equal to probability, so that the probability
that x lies in the interval from a to b is written as
Based on Discrete/ Countable Events

A DISCRETE
DISTRIBUTION
Random
Numbers
and Discrete
Distribution
Cumulative Distribution Function
The cumulative distribution function (CDF), F(x), of a
discrete random variable X is defined by,

𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ෍ 𝑃(𝑖)


𝑎𝑙𝑙𝑖≤𝑥

Experiment: Tossing a coin 3 times. Let x = # heads


H
Possible Outcomes
H T
HHH x P(x) F(x)
H H HHT 0 1/8 1/8
HTH 1 3/8 1/8 +3/8
T T 2 3/8 1/8 + 3/8 +3/8
HTT
H H THH 3 1/8 1/8 + 3/8 +3/8 + 1/8
𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 T
𝑃(𝑥 < 3) THT
𝑃(0 ≤ 𝑥 T TTH
≤ 3) Cumulative probability distribution
𝑃(1 ≤ 𝑥 T H TTT
< 2)
Expected Values of Discrete Random Variables
- Expected value (or expectation) of a random variable is the mean
- It is the value that is expected to occur, on average
- The expected value measures the centrality of the
probability distribution, as is the mean or average of a
frequency distribution
0
- It is a weighted average, with the values of the random 1 2

2.3
3 4 5

variable weighted by their probabilities.

The expected value of a discrete random x P(x) xP(x)


0 0.1 0.0
variable X is equal to the sum of each value of
1 0.2 0.2
the random variable multiplied by its 2 0.3 0.6
probability. 3 0.2 0.6
𝜇 = 𝐸(𝑋) = ෍ 𝑥𝑃(𝑋 = 𝑥) 4 0.1 0.4
𝑎𝑙𝑙 𝑥
5 0.1 0.5
1.0 2.3 = E(X) = m
43
Example
x -1 0 1 2
P(X=x) a 0.3 0.2 0.3

Given E(x) = 0.6 ෍ 𝑃(𝑋 = 𝑥) = 1


A. Find a
B. find P(X=1) 𝑎 + 0.3 + 0.2 + 0.3 = 1
C. Find 𝑃(0 < 𝑥 ≤ 2) 𝑎 = 1 − 0.8
𝑎 = 0.2
𝐸(𝑥) = ෍ 𝑥𝑃(𝑋 = 𝑥)

−1(𝑎) + 0(0.3) + 1(0.2) + 2 0.3


= 0.6
−𝑎 + 0.2 + 0.6 = 0.6
𝑎 = 0.2

𝑃(𝑥 = 1), 𝑎 = 0.2, 𝑃(0 < 𝑥 ≤ 2 = ෍ 𝑃(1 ≤ 𝑥 ≤ 2) = ෍ 0.2 + 0.3 = 0.5

44
Example
x -1 0 1 2
P(X=x) 0.2 0.3 a b

Given E(x) = 0.6


A. Find a and b ෍ 𝑃(𝑋 = 𝑥) = 1

B. find P(X=1) 0.2 + 0.3 + 𝑎 + 𝑏 = 1


C. Find 𝑃(0 < 𝑥 ≤ 2) 𝑎 + 𝑏 = 0.5

𝐸(𝑥) = ෍ 𝑥𝑃(𝑋 = 𝑥)

−1(0.2) + 0(0.3) + 1(𝑎) + 2(𝑏) = 0.6


−0.2 + 𝑎 + 2𝑏 = 0.6
Simultaneous Equation 𝑎 + 2𝑏 = 0.8
𝑎 + 𝑏 = 0.5 →
𝑎 + 2𝑏 = 0.8 𝑏 = 0.3
𝑎 + 𝑏 = 0.5 𝑎 + 0.3 = 0.5 → 𝑎 = 0.5 − 0.3 = 0.2
𝑎 = 0.2. . , 𝑏 = 0.3
𝑃(𝑥 = 1), 𝑎 = 0.2, 𝑃(0 < 𝑥 ≤ 2 = ෍ 𝑃(1 ≤ 𝑥 ≤ 2) = ෍ 𝑎 + 𝑏 = 0.5

45
Cont’d
The expected value of a function of a discrete random variable X is:

𝐸(𝑥) = ෍ 𝑥𝑃(𝑋 = 𝑥)

ℎ(𝑥)
𝐸[ℎ(𝑋)] = ෍ ℎ(𝑥)𝑃(𝑥)
𝑎𝑙𝑙 𝑥
𝐸𝑥𝑎𝑚𝑝𝑙𝑒
a linear function
ℎ(𝑥) = 𝑎𝑥 + 𝑏
E(aX + b) = aE(X) + b
𝐸[ℎ(𝑥)] = 𝑎𝐸(𝑥) + 𝑏
= 𝑎 ∗ ෍ 𝑥𝑃(𝑋 = 𝑥) + 𝑏

47
Example
𝑦 = 3𝑥 + 5

X P(x) XP(x)
3 0.2 0.6
4 0.1 0.4
5 0.3 1.5
6 0.2 1.2
7 0.2 1.4

𝑦 = 3𝑥 + 5
෍ 𝑥𝑝(𝑥) = 5.1

𝐸(𝑦) = 3(5.1) + 5 = 20.3

48
Cont’d
Example
Monthly sales of a certain product are believed to follow the given probability
distribution. Suppose the company has a fixed monthly production cost of $8000
and that each item brings $2. Find the expected monthly profit h(X), from product
sales

Number of items, x P(x) xP(X)


5000 0.2 1000
6000 0.3 1800
7000 0.2 1400
8000 0.2 1600
9000 0.1 900
෍ 𝑥𝑃(𝑋 = 𝑥) 6700

h (X) = 2X – 8000 where X = # of items sold


= 2 ∗ ෍ 𝑥𝑃(𝑋 = 𝑥) − 8000

𝟐 ∗ 𝟔𝟕𝟎𝟎 − 𝟖𝟎𝟎𝟎
= 5400 49
Variance and Standard Deviation of a Random
Variable
The variance of a random variable is the expected
squared deviation from the mean:
𝜎 2 = 𝑉(𝑋) =
𝑉(𝑥) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2

The standard deviation of a random variable is the


square root of its variance:  = SD( X ) = V ( X )
50
Example for data x = 1, 2, 3
x 𝒙𝟐 px X’2p(x) = E(𝒙𝟐 ) xP(x) =E(x)

1 1 0.5 1*0.5 =0.5 0.5


2 4 0.2 4*0.2 =0.8 0.4
3 9 0.3 6*0.3 =2.7 0.9
Sum = 4 Sum = 1.8
Sum*sum =
3.24

𝐸(𝑋 2 ) − [𝐸(𝑋)]2 = V(X)


𝐸 𝑋 2 = 𝑥 2 𝑝 𝑥 = 0.5 + 0.8 + 2.7 = 4
[𝐸(𝑋)]2 = [𝑥𝑝(𝑥)]2 = 0.5 + 0.4 + 0.9
= 1.8 𝑡ℎ𝑒𝑛, 𝑠𝑞𝑢𝑎𝑟𝑒 = (1.8 ∗ 1.8) = 3.24
𝑬(𝑿𝟐 ) − [𝑬(𝑿)]𝟐 = 4 − 3.24 = 0.76

51
• Suppose 60% of Ethiopian adults
approve of the way the Prime Minister
Abiy is handing his job.
Example
• Randomly selected sample of 2
Ethiopian Adults
• Let X represent the number that
approve.

• Calculate V(x)

x 0 1 2

P(X) 0.16 0.48 0.36


𝐸(𝑋 2 ) − [𝐸(𝑋)]2

52
Some Properties of Means and Variances of
Random Variables
The mean or expected value of the sum of random variables
is the sum of their means or expected values:
m( X+Y) = E( X +Y) = E( X) + E(Y) = mX + mY
For example: E(X) = $350 and E(Y) = $200
E(X+Y) = $350 + $200 = $550
The variance of the sum of mutually independent random
variables is the sum of their variances:
 2 ( X +Y ) = V ( X + Y) = V ( X ) +V (Y) =  2 X +  2 Y
if and only if X and Y are independent.

For example: V(X) = 84 and V(Y) = 60 V(X+Y) = 144


54
Cont’d
𝐸(𝑋1 + 𝑋2 +. . . +𝑋𝑘 ) = 𝐸(𝑋1 ) + 𝐸(𝑋2 )+. . . +𝐸(𝑋𝑘 )

NOTE:
𝐸(𝑎1 𝑋1 + 𝑎2 𝑋2 +. . . +𝑎𝑘 𝑋𝑘 ) = 𝑎1 𝐸(𝑋1 ) + 𝑎2 𝐸(𝑋2 )+. . . +𝑎𝑘 𝐸(𝑋𝑘 )

The variance of the sum of k mutually independent random


variables is the sum of their variances:
𝑉(𝑋1 + 𝑋2 +. . . +𝑋𝑘 ) = 𝑉(𝑋1 ) + 𝑉(𝑋2 )+. . . +𝑉(𝑋𝑘 )

and

𝑉(𝑎1 𝑋1 + 𝑎2 𝑋2 +. . . +𝑎𝑘 𝑋𝑘 ) = 𝑎12 𝑉(𝑋1 ) + 𝑎22 𝑉(𝑋2 )+. . . +𝑎𝑘2 𝑉(𝑋𝑘 )

55
Cont’d The variance of a linear function of random variable is:

𝑉(𝑎𝑋 + 𝑏) = 𝑎2 𝑉(𝑋) = 𝑎2 𝜎 2
Example = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 =
Number = 465000000 − [6700 ∗ 6700]
of items,x P(x) xP(x) x2 x2 P(x) = 420,110,000
5000 0.2 1000 25k6 5000000 SD = 20,496.58
6000 0.3 1800 36k6 10800000 𝑉(𝑎𝑋 + 𝑏) = 𝑎2 𝑉(𝑋) = 𝑎2 𝜎 2
𝑉(2𝑋 − 8000) = (22 )𝑉(𝑋)
7000 0.2 1400 49k6 9800000 = (4)(420,110,000) = 1,680,440,000

8000 0.2 1600 64k6 12800000 𝜎(2𝑥−8000) = 𝑆𝐷(2𝑥 − 8000)


= 2𝜎𝑥 = (2)(20,496.58) = 81986.32
9000 0.1 900 81k6 8100000
1.0 6700 46500000
56
Summarizing the properties of Expected Value
and Variance of R.V :
1. E(c) = c
• The expected value of a constant (c) is just the
value of the constant.
2. E(X + c) = E(X) + c
3. E(cX) = cE(X)
• We can “pull” a constant out of the expected value
expression (either as part of a sum with a random
variable X or as a coefficient of random variable X).

57
Cont’d
4. V(c) = 0
• The variance of a constant (c) is zero.
5. V(X + c) = V(X)
• The variance of a random variable and a constant is just the variance
of the random variable.
6. V(cX) = c2V(X)
• The variance of a random variable and a constant coefficient is the
coefficient squared times the variance of the random variable.

58
Continuous Random Variable & Probability Distributions
Random Variable

Random
Variables

Discrete Continuous
Random Variable Random Variable

62
Introduction

A continuous random variable:


A continuous random variable is a variable that can assume any value on a
continuum (can assume an uncountable number of values)
⚫ has an uncountably infinite number of possible values
⚫ moves continuously from value to value
⚫ has no measurable probability associated with each value
⚫ measures (e.g.: height, weight, speed, length and etc)
Continuous Probability Distributions

A continuous random variable can assume any value in an interval on


the real line or in a collection of intervals.

The probability of the random variable assuming a value within some


given interval from x1 to x2 is defined to be the area under the graph
of the probability density function between x1 and x2.
Continuous Probability Density Function

This curve, a function of x, is denoted by the symbol f(x) and


is variously called a probability density function (pdf), a
frequency function, or a probability distribution.

The areas under a probability density


function correspond to probabilities for
x. The area A beneath the curve
between two points a and b is the
probability that x assumes a value
between a and b.

65
Rules for CRV

66
Probability Distributions
Continuous
Probability
Distributions

Continuous
Uniform Distributions

Continuous
Normal Distributions

Continuous Joint
Probability Distributions

Exponential Distributions

Weibull Distributions

68
Uniform Distribution
Continuous random variables that appear to have
equally likely outcomes over their range of possible
values possess a uniform probability distribution.
Suppose the random variable
x can assume values only in
an interval c ≤ x ≤ d. Then the
uniform frequency function
has a rectangular shape.

69
69
Uniform Probability Distribution of
Random Variable x

1
Probability density function: f (x) = cxd
d−c

c+d d−c
Mean: m = Standard Deviation: =
2 12

P (a  x  b ) = (b − a ) (d − c ), c  a  b  d

70
Cont’d

Example: Suppose you’re a production manager of a soft drink bottling


company. You believe that when a machine is set to dispense 12 lt., it
really dispenses between 11.5 and 12.5 lt. inclusive. Suppose the amount
dispensed has a uniform distribution. What is the probability that less
than 11.8 lt. is dispensed?

71
f(x)
1 1
= 1.0
d − c 12.5 − 11.5
1
= = 1.0
1
x
11.5 11.8 12.5

P(11.5  x  11.8) = (Base)*(Height)


= (11.8 – 11.5)/(1) = .30
P (a  x  b ) = (b − a ) (d − c ), c  a  b  d 72
72
Example – cost of travel
Suppose a firm reimburses employees at the rate of 40 cents per kilometre when
an employee uses his or her own automobile for company travel. Over the past
years, the number of kilometres reimbursed in this manner has been between
100,000 and 150,000. The probability distribution of anticipated annual travel
costs for the firm is considered to be a uniform distribution over this range.
At the lower bound of 100,000 km., the total cost would be $40,000 and at the
upper bound of 150,000 km., the total cost would be $60,000. Since travel is not
anticipated to be less than 100,000 km. nor greater than 150,000 km., cost is
zero outside the lower and upper bound. What is the probability that travel costs
are between 40 and 50 thousands? Determine the expected value and variance

Uniform probability distribution for travel example

Let x be the anticipated total cost of annual travel for the


firm in thousands of dollars. Then the uniform probability
distribution is:

 1
 for 40  x  60
 60 − 40
f ( x) = 


 0 elsewhere

74
Uniform probability distribution of total expected
travel cost for the firm
Let x represents the total anticipated annual travel costs in
thousands of dollars, the distribution has density f(x) = 1/(60 - 40)
= 1/20 over the range from 40 to 60 and density of 0 elsewhere.

f(x)

1/20

x
40 60

Travel cost in thousands of


dollars

75
Area of the distribution
f(x)

1/20

x
40 60
Travel costs
• Note that the total area under the density function f(x) between 40 and 60
equals 1.
Area = height x length = (1/20) x (60–40) = (1/20) x 20 = 1 and this equals the
probability that some value within the range from 40 to 60 occurs.

76
Area and probability over an interval
f(x)

1/20

x
40 50 60
Travel costs
Therefore, the probability that travel costs are between 40 and 50
Area under the curve between 40 and 50
= height x length = (1/20) x (50–40) = (1/20) x 10 = 0.5
and this is the required probability.

77
Expected value and variance for a uniform
probability distribution
For the travel cost example:
a+b
E ( x) = E(x) = (40 + 60)/2 = 50
2
Var(x) = (60–40)2/12 = 33.333
(b − a )
2
Var ( x ) =  =
2

12 Standard deviation of x is the

b−a
square root of 33.333 or 5.774
=
12

78
Example: Slater's Buffet
Slater customers are charged for the amount of salad they
take. Sampling suggests that the amount of salad taken is
uniformly distributed between 5 ounces and 15 ounces.
What is the probability that a customer will take between
12 and 15 ounces of salad?

The probability density function is


f(x) = 1/10 for 5 < x < 15
=0 elsewhere
where:
x = salad plate filling weight

79
Example: Slater's Buffet

f(x)

P(12 < x < 15) = 1/10(3) = .3

1/10

x
5 10 12 15
Salad Weight (oz.)

80
Example: Slater's Buffet
• Expected Value of x
E(x) = (a + b)/2
= (5 + 15)/2
= 10
• Variance of x
Var(x) = (b - a)2/12
= (15 – 5)2/12
= 8.33

81
Normal probability distribution
• The normal probability distribution is the most common and
important of the continuous probability distributions used in
statistical and econometric work.

• Other names for the normal distribution are the bell curve,
since it has a sort of bell shape, and the Gaussian distribution,
after Gauss, who is considered to be the first to have described
and used the distribution.

82
• The most often used continuous probability distribution is the normal
distribution; it is also known as Gaussian distribution.

• Its graph called the normal curve is the bell-shaped curve.


Normal • Such a curve approximately describes many phenomenon occur in nature,
industry and research.
Distribution – Physical measurement in areas such as meteorological experiments,
rainfall studies and measurement of manufacturing parts are often
more than adequately explained with normal distribution.

• A continuous random variable X having the bell-shaped distribution is


called a normal random variable.
Normal Probability Distribution
• Graph of the Normal Probability Density Function

f(x)

x
m

84
Formula and parameters for the normal distribution

• There are many normal distributions, but any normal distribution


can be described and graphed with two parameters (µ and σ) and
the following formula.
( x−m )2
1 −
f ( x) = e 2 2
 2
Where, µ is the mean of the normal distribution
σ is the standard deviation of the normal distribution
π is 3.14159
e is 2.71828, the base of the natural logarithms

85
Normal Distribution
• The mathematical equation for the probability distribution of the normal variable
depends upon the two parameters 𝜇 and 𝜎, its mean and standard deviation.

f(x)
𝜎

𝜇
x

Definition 4.9: Normal distribution


The density of the normal variable 𝑥 with mean 𝜇 and variance 𝜎 2 is
2
1 −(𝑥−𝜇) ൗ2𝜎2
𝑓 𝑥 = 𝑒 −∞ < 𝑥 < ∞
𝜎 2𝜋

where 𝜋 = 3.14159 … and 𝑒 = 2.71828 … . ., the Naperian constant


Some characteristics of the normal distribution

 The curve peaks at the mean, µ, so the mode also equals µ.


 The distribution is symmetric about the centre, µ, so the median is also µ. The
distribution is not skewed.
 The tails of the distribution never quite reach the horizontal axis, but get closer and
closer to this axis the further away from the centre x is. This characteristic means that
the distribution is said to be asymptotic to the horizontal axis.
 The probability that a normally distributed variable x takes on values in the range from
a to b is the area under f(x) between a and b.
 The total area under the curve is 1; the area under the curve to the left of centre is 0.5
and the area right of centre is 0.5.

87
Normal Distribution:Summary
1. ‘Bell-shaped’ & symmetrical f(x )
2. Mean, median, mode are equal
3. Location is characterized by the
mean, μ
4. Spread is characterized by the
standard deviation, σ
x
5. The random variable has an infinite
theoretical range: - to + Mean
Median
Mode
Normal Distribution
σ1 = σ2
σ1

σ2

µ1
µ1 µ
µ2 2 µ1 = µ2
Normal curves with µ1< µ2 and σ1 = σ2 Normal curves with µ1 = µ2 and σ1< σ2

σ1

σ2

µ1 µ2
Normal curves with µ1<µ2 and σ1<σ2
Properties of Normal Distribution
– The curve is symmetric about a vertical axis through the mean 𝜇.
– The random variable 𝑥 can take any value from −∞ 𝑡𝑜 ∞.
– The most frequently used descriptive parameter s define the curve itself.
– The mode, which is the point on the horizontal axis where the curve is a
maximum occurs at 𝑥 = 𝜇.
– The total area under the curve and above the horizontal axis is equal to 1.
∞ 1 ∞ − 1 2 (𝑥−𝜇)2
‫׬‬−∞ 𝑓 𝑥 𝑑𝑥 =
𝜎 2𝜋
‫׬‬−∞ 𝑒 2𝜎 𝑑𝑥 =1
1
∞ 1 ∞ − 2 (𝑥−𝜇)2
– 𝜇= ‫׬‬−∞ 𝑥. 𝑓 𝑥 𝑑𝑥 = ‫׬‬−∞ 𝑥. 𝑒 2𝜎 𝑑𝑥
𝜎 2𝜋
1
2 1 ∞ 2 −2[(𝑥−𝜇)ൗ𝜎2]
– 𝜎 = ‫׬‬−∞(𝑥 − 𝜇) . 𝑒 𝑑𝑥
𝜎 2𝜋
1 𝑥2 − 12 (𝑥−𝜇)2
– 𝑃 𝑥1 < 𝑥 < 𝑥2 = ‫ 𝑒 𝑥׬‬2𝜎 𝑑𝑥
𝜎 2𝜋 1
denotes the probability of x in the interval (𝑥1 , 𝑥2 ). 𝜇 x1 x2
Standard Normal Distribution
• The normal distribution has computational complexity to calculate 𝑃 𝑥1 < 𝑥 < 𝑥2 for any two (𝑥1 , 𝑥2 ) and
given 𝜇 and 𝜎
• To avoid this difficulty, the concept of 𝑧-transformation is followed.

𝑥−𝜇
z= 𝜎
[Z-transformation]

• X: Normal distribution with mean 𝜇 and variance 𝜎 2 .


• Z: Standard normal distribution with mean 𝜇 = 0 and variance 𝜎 2 = 1.
• Therefore, if f(x) assumes a value, then the corresponding value of 𝑓(𝑧) is given by
1 𝑥2 − 1 2 (𝑥−𝜇)2
𝑓(𝑥: 𝜇, 𝜎) : 𝑃 𝑥1 < 𝑥 < 𝑥2 = 𝜎 2𝜋 ‫ 𝑒 𝑥׬‬2𝜎 𝑑𝑥
1
1 𝑧2 −1𝑧 2
= 𝜎 2𝜋 ‫ 𝑒 𝑧׬‬2 𝑑𝑧
1

= 𝑓(𝑧: 0, 𝜎)
Effect of Varying
Parameters (m & )
Normal Probability Distribution

Probability is
area under d
curve! P(c  x  d) = 
c
f (x)dx ?

f(x)

x
c d
Standard Normal Distribution
➢The standard normal distribution is a normal distribution with µ = 0 and  = 1.
A random variable with a standard normal distribution, denoted by the symbol
z, is called a standard normal random variable.
➢Probabilities associated with values of this standard normal random variable are
tabulated.

x−m
z=

The random variable Z represents the distance of X from its mean in terms of
standard deviations. It is the key step to calculate a probability for an arbitrary
normal random variable.

94
95
95
96
96
104
104
105
105
106
106
107
107
108
108
109
109
Example 2.
Students’ scores of Statistics for Engineers course are approximately distributed
normally with mean 80 and standard deviation 5.
• What is the probability that a student scores 82 or less?
P(X ≤ 82) = P(Z ≤ (82-80)/5) = P(Z ≤ .40) = .6554
• What is the probability that a student scores a 90 or more?
P(X ≥ 90) = P(Z ≥ (90-80)/5) = P(Z ≥ 2.00) = 1 - P(Z ≤ 2.00) = 1 - .9772 = .0228
• What is the probability that a student scores a 74 or less?
P(X ≤ 74) = P(Z ≤ (74-80)/5) = P(Z ≤ -1.20) = .1151
If your table does not have negatives, use P(Z ≤ -1.20) = P(Z ≥ 1.20) = 1 - .8849 =
.1151
• What is the probability that a student scores between 78 and 88?
P(78 ≤ X ≤ 88) = P((78-80)/5 ≤ Z ≤ (88-80)/5) = P(-0.40 ≤ Z ≤ 1.60) = P(Z ≤
1.60) - P(Z ≤ -0.40) = .9452 - .3446 = .6006
• What is the probability that an average of three scores is 82 or less?
P(X ≤ 82) = P(Z ≤ (82-80)/(5/√3)) = P(Z ≤ .69) = .7549

11
0
Home Study

You work in Quality Control for GE. A. between 2000 and 2400 B. less than 1470 hours?
Light bulb life has a normal hours?
distribution with m = 2000 hours and s
= 200 hours. What’s the probability
that a bulb will last

115
Solution* P(2000  x  2400)
x−m 2400 − 2000
z= = = 2.0
 200
Normal Standardized Normal
Distribution Distribution

 = 200 =1

.4772

m = 2000 2400 x m=0 2.0 z


116
116
Solution* P(x  1470)
x−m 1470 − 2000
z= = = −2.65
 200
Normal Standardized Normal
Distribution Distribution

 = 200 =1
.5000

.0040 .4960

1470 m = 2000 x –2.65 m = 0 z


117
117
Finding z-Values
for Known Probabilities
Standardized Normal Probability Table
What is Z, given (Portion)
P(z) = .1217?
=1 Z .00 .01 0.2
.1217
0.0 .0000 .0040 .0080

0.1 .0398 .0438 .0478

m=0 ?
.31 z 0.2 .0793 .0832 .0871

0.3 .1179 .1217 .1255


118
118
Other continuous Distributions
The Lognormal Distribution

The density of the normal variable 𝑥 with mean 𝜇 and variance 𝜎 2 is


2
1 −(𝑥−𝜇) ൗ2𝜎2
𝑓 𝑥 = 𝜎 2𝜋
𝑒 −∞ < 𝑥 < ∞

where 𝜋 = 3.14159 … and 𝑒 = 2.71828 … . ., the Naperian constant


Exponential Distribution
•The length of time or the distance between occurrences of
random events can often be described by the exponential
probability distribution.

For this reason, the exponential distribution is sometimes called


the waiting-time distribution.

•The length of time between emergency arrivals at a hospital, the


length of time between breakdowns of manufacturing
equipment, and the length of time between catastrophic events
(e.g., a stock-market crash), are all continuous random
phenomena that we might want to describe probabilistically.

122
Finding the Area to the Right of a Number a for
an Exponential Distribution
Estimation and Statistical Intervals
Estimation theory
Estimation refers to any procedure where a sample information is used to estimate
or predict the value of a population parameter
Parameter is a characteristic or measure obtained from a population
Statistic is a characteristic or measure obtained from a sample
An Estimator is a sample statistic that is used in estimating a population parameter.
An Estimate is the value determined from the estimator as an estimate of the
population parameter.
There are two ways of estimation
1) Point Estimation and
2) Interval Estimation
1. Point Estimation
✓A single-valued estimate.
✓A single element chosen from a sampling distribution.
✓Conveys little information about the actual value of the population
parameter and about the accuracy of the estimate.
_
For example, a population mean (m) is estimated by a sample mean (x) and
population standard deviation (x) is estimated by sample standard deviation
(Sx)
Point Estimation
Property of Estimators
• The desirable property that a good estimator should possess is
that it be unbiased.
• An estimator is unbiased if, in repeated random samples, the
numerical values of the estimator stack up around the
population parameter that we are trying to estimate.
• If the repeated random samples are centred some where else
then the exhibit amount of bias.
2. Interval Estimation (Confidence Interval )
Point estimation produces a single value as an estimate of a
population parameter. The estimate may or may not be
close to the actual parameter value; thus, the estimate might
be incorrect.
• An interval estimate describes a range of values within
which a parameter might lie.
• An interval or range of values believed to include the
unknown population parameter.
• Associated with the interval there is a measure of
confidence that the interval does indeed contain the
parameter of interest.

133
• Because of these, interval estimation are more desirable than point
estimation.
• A confidence interval or interval estimate has two components:
✓A range or interval of values
✓An associated level of confidence
Confidence Interval Estimation of a
Population Mean m

• A confidence interval estimate of m is an interval


estimate, together with the statement of how confident
(ex. 90%, 95%) we are that the interval is correct.
• Based on whether the population standard deviation 
is known or not we use different methods in
constructing a confidence interval for m.
Confidence Interval Estimation of m,
when  is known
• The confidence interval estimate of m when  is known is given
by :
𝜎 𝑥−𝜇
𝑥lj ± 𝑍
𝑛
;z =
𝜎

• Z is called a standard normal random variable with mean m = 0


and standard deviation  = 1; Z ̴ N(0,1 )
• The z value is determined based on the desired level of
confidence.
𝜎
• The quantity 𝑛 is often called the margin of error or the
sampling error
That is, there is a 98.68% chance that
the mean of a random sample of size n
5 5 will be within 2 units of the
population mean .
⚫ If the population distribution is normal, the sampling distribution of the mean is
normal. −
• If the sample size is sufficiently large, regardless of the shape of the population
distribution, the sampling distribution is normal (Central Limit Theorem)
The normal distribution probability density function:

x
−m 2 


 ÷
N o rm a l D is trib u tio n : m = 0 ,  = 1 

÷

0.4
f ( x) = 1 e 2 2 for − x
0.3
2 2
e = 2.718281 ... and  = 314159265
f(x)

0.2
. ...
0.1

0.0 𝜎 𝜎
-5 0 5 𝑃 𝑥lj − 𝑍 < 𝜇 < 𝑥lj + 𝑍 ___(𝑍? ) = 0.95
𝑛 𝑛
m
𝜎 𝜎
𝑃 𝑥lj − 1.96 < 𝜇 < 𝑥lj + 1.96 = 0.95
𝑛 𝑛
Normal Probabilities (Empirical Rule)
• The probability that a normal random
variable will be within 1 standard
deviation from its mean (on either S ta n d ard N o rm al D is trib u tio n

side) is 0.6826, or approximately 0.68. 0.4

• The probability that a normal random 0.3

variable will be within 2 standard

f(z)
0.2

deviations from its mean is 0.9544, or


0.1
approximately 0.95.
0.0

• The probability that a normal random -5 -4 -3 -2 -1 0


Z
1 2 3 4 5

variable will be within 3 standard


deviation from its mean is 0.9974.
95% Intervals around the Sample Mean
Sampling Distribution of the Mean
0.4
Approximately 95% of the intervals
𝜎
0.3
95% 𝑥lj ± 1.96
𝑛
around the sample mean can be
expected to include the actual value of the
population mean, m. (When the sample
f(x)

0.2

0.1
2.5% 2.5% mean falls within the 95% interval around
0.0 the population mean.)
  x
m − 196
. m m + 196
.
n n
𝜎 x 𝜎
𝑥lj − 1.96 𝑥lj + 1.96
𝑛 𝑛
x

x *5% of such intervals around the sample


x mean can be expected not to include the
* x
x actual value of the population mean.
x
(When the sample mean falls outside the
x
95% interval around the population mean.)
x−
x
x+
x
*
A (1-a )100% Confidence Interval for m
We define za as the z value that cuts off a right-tail area of a under the standard
2
normal curve. (1-a) is called the confidence coefficient. a is called the error
2

probability, and (1-a)100% is called the confidence level.


S tand ard Norm al Distrib ution  
P z > za = a/
0.4  
(1 − a ) 2
  a/
P z  − za =
 
0.3
2
 
f(z)

P  −za  z  za = (1 − a)
0.2

0.1 a a  2 2

2 2
0.0 (1- a)100% Confidence Interval:
-5 -4 -3 -2 -1 0 1 2 3 4 5 
−z a Z za x ± za
2 2
2 n
Critical Values of z and Levels of Confidence

(1 − a )
a za
Stand ard N o rm al Distrib utio n

0.4
2 2 (1 − a )
0.3
0.99 0.005 2.576

f(z)
0.2

0.98 0.010 2.326 0.1 a a


2 2
0.95 0.025 1.960 0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
−z a za
0.90 0.050 1.645 2
Z
2

0.80 0.100 1.282


The Level of Confidence and the Width of the
Confidence Interval
When sampling from the same population, using a fixed sample size, the
higher the confidence level, the wider the confidence interval.
St an d ar d N or m al Di s tri b uti o n St an d ar d N or m al Di stri b uti o n

0.4 0.4

0.3 0.3
f(z)

f(z)
0.2 0.2

0.1 0.1

0.0 0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Z Z

80% Confidence Interval: 95% Confidence Interval:


 
x  128
. x  196
.
n n
The Sample Size and the Width of the Confidence
Interval
When sampling from the same population, using a fixed confidence
level, the larger the sample size, n, the narrower the confidence
interval.
S a m p lin g D is trib utio n o f th e M e an S a m p lin g D is trib utio n o f th e M e an

0 .4 0 .9

0 .8

0 .3 0 .7

0 .6

0 .5

f(x)
f(x)

0 .2
0 .4

0 .3
0 .1
0 .2

0 .1
0 .0 0 .0

x x

95% Confidence Interval: n = 20 95% Confidence Interval: n = 40


Finding Probabilities of the Standard Normal Distribution: P(0 < Z
< 1.56)
Standard Normal Probabilities (Z - table)
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
Standard Normal Distribution 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.4
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.3 0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
f(z)

0.2
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
0.1 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.56{ 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
0.0 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
-5 -4 -3 -2 -1 0 1 2 3 4 5 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
Z 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
Look in row labeled 1.5 and 2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
column labeled .06 to find 2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

P(0  z  1.56) = 0.4406


2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
Example 1
A credit union wants to estimate the mean amount of outstanding loans.
Past experience reveals that the standard deviation is 250 birr.
Determine a 98% confidence interval estimate for the mean of all
outstanding
- loans (population mean) if a random sample of 100
outstanding loans has a sample mean of 1,950 birr.
Solution
𝑥lj = 1950,  = 250, n = 100, C. L = 98%, z = 2.33
𝜎
𝑛 = 250 /100 = 250/10 = 25
The interval:
𝜎
𝑥lj ± 𝑍
𝑛 = 1,950 + 2.33(25) = 1,950 + 58.25 to 1,950 – 58.25
The interval is then from 1,891.75 to 2,008.25.
Interpretation:
The credit union can say with 98% confidence that the mean amount
of outstanding loans is between birr 1,891.75 and 2,008.25
Home work

Population consists of the Fortune 500 Companies as ranked by


Revenues. You are trying to find out the average Revenues for
the companies on the list. The population standard deviation
is $15,056.37. A random sample of 30 companies obtains a
sample mean of $10,672.87. Give a 95% and 90% confidence
interval for the average Revenues.
Confidence Interval or Interval Estimate for m
When  Is Unknown - The t Distribution

If the population standard deviation, , is not known, replace  with the


sample standard deviation, s. The confidence interval estimate of m is given
by:
𝑠
𝑥lj ± 𝑡 𝑋ሜ − 𝜇
𝑛 𝑡= 𝑠
;
𝑛
The value of t can be read from the t distribution table at the degrees of
freedom (n - 1) and at the desired level of confidence
Cont’d
• The t distribution called Student’s t
distribution. It resembles the standard normal Standard normal
distribution Z: it has a bell-shaped,
symmetrical distribution curve.
• The t distribution curve is flatter and has fatter t, df = 20
tails than does the standard normal. t, df = 10
• The mean of a t distribution curve is zero.
• For df > 2, the variance of t is df/(df-2). This
is greater than 1, but approaches 1 as the
number of degrees of freedom increases.
• The t distribution approaches the standard 
normal as the number of degree of freedom m

increases
Cont’d
• The t distributions approach the standard normal distribution as
n increases.
• As a result, we can use the standard normal distribution (z
value table) when  is not known and n > 30 in constructing
an approximate interval estimate for m
• When n < 30 and  is not known t distribution table is used.

153
Cont’d

A (1-a)100% confidence interval for m when  is not known


𝑠
𝑥lj ± 𝑡𝛼
2 𝑛

where ta is the value of the t distribution with n-1 degrees of


a
2

freedom that cuts off a tail area of to its right.


2
The t Distribution table
df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657 t D is trib utio n: d f = 1 0
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841 0 .4
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707 0 .3
7 1.415 1.895 2.365 2.998 3.499 Area = 0.10 Area = 0.10
8 1.397 1.860 2.306 2.896 3.355

}
f(t)
9 1.383 1.833 2.262 2.821 3.250 0 .2

10 1.372 1.812 2.228 2.764 3.169


11 1.363 1.796 2.201 2.718 3.106
0 .1
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977 0 .0
15 1.341 1.753 2.131 2.602 2.947 -1.372 1.372
-2.228 0 2.228
16 1.337 1.746 2.120 2.583 2.921

}
17 1.333 1.740 2.110 2.567 2.898 t
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861 Area = 0.025 Area = 0.025
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23
24
1.319
1.318
1.714
1.711
2.069
2.064
2.500
2.492
2.807
2.797
Whenever  is not known (and the population is
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
assumed normal), the correct distribution to use is
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
the t distribution with n-1 degrees of freedom.
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
Note, however, that for large degrees of freedom,
40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
the t distribution is approximated well by the Z
120 1.289 1.658 1.980 2.358 2.617
 1.282 1.645 1.960 2.326 2.576
distribution.
Example 1
A stock market analyst wants to estimate the average return on a certain
stock. A random sample of 15 days yields an average (annualized) return of
𝑥lj = 10.37% and a standard deviation of s = 3.5%. Assuming a normal
population of returns, give a 95% confidence interval for the average return
on this stock.

df t0.100 t0.050 t0.025 t0.010 t0.005


--- ----- ----- ------ ------ ------ The critical value of t for df = (n -1) = (15 -1)
1 3.078 6.314 12.706 31.821 63.657
. . . . . . =14 and a right-tail area of 0.025 is:
t0.025 = 2.145
. . . . . .
. . . . . .
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947 The corresponding confidence interval or
interval estimate is: x  t 0.025 s
. . . . . .
. . . . . .
. . . . . . n
35
.
= 10.37  2.145
15
= 10.37  1.94
= 8.43,12.31
Home Study Example
Example 2
A random sample of 100 customer accounts at a large firm is selected
for the purpose of estimating the mean number of transactions per
year for each customer. The sample mean is 43 and the sample
standard deviation is 12. Determine a 90% confidence interval
estimate for m.
Solution:- In this problem, as  is not known, the appropriate
distribution is t- distribution. However, since n > 30 we can
approximate
_
it by the standard normal distribution.
n= 100, x = 43, S = 12, C.L. = .90 , and z = 1.64
Sx = S / n = 12 / 100 = 1.2
The interval estimate is x + zSx = 43 + 1.64 (1.2) = 43 + 1.97
Thus, the interval is from 41.03 to 44.97
We are 90% confident that the mean number of transactions per year
per customer falls between 41.03 and 44.97.
158
Cont’d
Example 3
A quality control inspector of a Company selects frequent random
samples of size n=6 from the output of an automatic machine to
check on the average diameter m of parts being made. Diameters are
normally distributed. The sample has a mean diameter of 2.0016
inches and a standard deviation of 0.0012 inches. Construct the 99%
confidence interval for m.
Solution:-In this problem x is not known and n < 30. Therefore, we use
t-distribution.
x_ = 2.0016, C.L. = 0.99, a = 1 - 0.99 = 0.01, df = n - 1 = 6-1 = 5,
S x = 0.0012 / 6 = 0.0004898, t a/2,v = t 0.005,5 = 4.032
The interval is given by x + ta/2,v S x = 2.0016 + (4.032 x 0.004898)
_
= 2.0016 + 0019748. This means that we are 99% confident that m
would fall between 1.9996252 and 2.0035748

159
Determination of Sample Size
• Collecting valid information through sampling requires
careful planning, including determination of an appropriate
sample size.
• How large should the sample size be? The answer depends
on the following three factors.
1. How precise (narrow) do we want a confidence interval estimate
to be?
2. How confident do we want to be that the interval estimate is
correct?
3. What is the standard deviation of the population in question?
• Generally the higher the desired precision or level of
confidence, the larger will be the sample size.
• And also, the larger the population variability is, the larger
will be the sample size.
160
Sample Size for Interval Estimation of m
• Consider
_
z = (x - m)/( /n)
Solving this for n we get the following
_
n =(z  ) / (x - m)2
2 * 2

This is the formula for computing sample size for interval


estimation of m.
• There are three quantities that determine the value of n
– The value of z reflecting the confidence interval
– The absolute value of (x - m) which represents the maximum error
in estimation
– What is your estimate of the variance (or standard deviation) of
the population in question? When  is not known, Sx from a pilot
sample is used in its place.

161
Cont’d
Example:
For the purpose of illustration, assume the desired confidence
level is 95%. If =15 and we want an estimate of m with a
maximum error in estimation of 5, the required sample size
would be computed as follows.
Solution: n =(z2 *2) / (x - m)2
c.l = 0.95, a = 0.05, z = 1.96
x = 15
| x - m| = 5
n = [(1.96)2 * (15) 2 ] / 5 2
= 34.5744 or 35
In sample size determination, no matter what the value of the
decimal places is, we round them up wards.
162
HYPOTHESIS TESTING
Hypothesis Testing
• The techniques of statistical inference can be classified into two broad categories: parameter estimation and
hypothesis testing. We have already briefly introduced the general idea of point estimation of process
parameters.
• A statistical hypothesis is a statement about the values of the parameters of a probability distribution. For
example, suppose we think that the mean inside diameter of a bearing is 1.500 in. We may express this statement
in a formal manner as
Hypothesis Testing
• An important part of any hypothesis testing problem is determining the parameter values specified in the null and
alternative hypotheses.
• Generally, this is done in one of three ways.
• First, the values may result from past evidence or knowledge. This happens frequently in statistical quality
control, where we use past information to specify values for a parameter corresponding to a state of
control, and then periodically test the hypothesis that the parameter value has not changed.
• Second, the values may result from some theory or model of the process.
• Finally, the values chosen for the parameter may be the result of contractual or design specifications, a situation
that occurs frequently.
Hypothesis Testing

• To test a hypothesis, we take a random sample from the population under study, compute an appropriate test
statistic, and then either reject or fail to reject the null hypothesis The set of values of the test statistic leading to
rejection of 𝐻0 is called the critical region or rejection region for the test.
• Two kinds of errors may be committed when testing hypotheses. If the null hypothesis is rejected when it is true,
then a type I error has occurred. If the null hypothesis is not rejected when it is false, then a type II error has
been made. The probabilities of these two types of errors are denoted as


Hypothesis Testing
• Thus, the power is the probability of correctly rejecting 𝐻0 . In quality control work, α is sometimes called the
producer’s risk, because it denotes the probability that a good lot will be rejected, or the probability that a
process producing acceptable values of a particular quality characteristic will be rejected as performing
unsatisfactorily.
• In addition, is sometimes β called the consumer’s risk, because it denotes the probability of accepting a lot of
poor quality, or allowing a process that is operating in an unsatisfactory manner relative to some quality
characteristic to continue in operation..

• .


1. Inference on the Mean of a Population, variance
Known
Confidence Interval on the Mean with Variance
Known.
• Confidence Intervals. An interval estimate of a parameter is the interval between two statistics that includes
the true value of the parameter with some probability.
END

You might also like