Busines Statistics Chap I

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 28

BUSINESS STATISTICS CHAPTER ONE

CHAPTER -I
SAMPLING AND SAMPLING DISTRIBUTIONS

Sampling Theory
Statisticians use data in many different ways for different purposes. For example, a manufacturer might
want to know something about the consumers who will be purchasing its product so he/she can plan an
effective marketing strategy. In another situation, the management of a company might survey its
employees to assess their needs in order to negotiate a new contract with the employees’ union. Trends in
various areas, such as the stock market, can be analyzed, enabling prospective buyers to make more
intelligent decisions concerning what stocks to purchase. These examples illustrate a few situations where
collecting data will help people make better decisions on courses of action and these data may be collected
from either finite or infinite population. If the data collected from all universe through census method the
quality of the data obtain is so good. However, it is unrealistic to undertake the whole universe under the
study. Under such circumstances we take a sample or a subset of the population to study the
population. Sampling is that part of statistical practice concerned with the selection of individual
observations intended to yield some knowledge about a population of concern, especially for the purposes of
statistical inference. Before having further discussion on the specific type of sampling methods, it is valuable
to be acquainted to the following terms:
Basic Definitions
A population is the group of people, items or units under investigation or it is a set of all the elements of
interest in a study and can be finite and infinite.
A sample: - is a subset of the population.
Census:- A complete enumeration survey method to collect data on the entire population.
Sample survey:- A survey to collect data on a sample.
Sampling unit: - A single element or group of elements subject to selection in the sample.
Sampling Frame:- A list of elements from which a sample may be drawn; also called working population.
Sampling: - The process or method of sample selection from the population.
Parameter: - A characteristic or measure obtained by using all the data values for a specific population
Statistic: - Characteristic or measure obtained from a sample.
Statistics:- The art and science of collecting, analyzing, presenting, and interpreting data.
Statistical inference:- The process of using data obtained from a sample to make estimates or test
hypotheses about the characteristics of a population.
The Need for Samples.
 Cheaper than census
 Greater speed-takes smaller time as compared to census
 Economy of efforts as relatively fewer staffs are needed
 More detailed information can be collected using sample
 Better quality of interviewing, supervision and other related activities.

DBU Page 1 of 20
Dept of ACFN
BUSINESS STATISTICS CHAPTER ONE

Essentials and Limitations of Sampling


i. Representativeness: A sample should be so selected that it truly represents the universe otherwise the
results obtained may be misleading. To ensure representative ness the random method of selection should
be used.
ii. Adequacy: The size of sample should be adequate; otherwise it may not represent the characteristics of
the universe.
iii. Independence: All items of the sample should be selected independently of one another. By
independence of selection we mean that the selection of a particular item in one draw has no influence on
the probabilities of selection in any other draw.
iv. Homogeneity: mean that there is no basic difference in the nature of units of the universe and that
of the sample. If two samples from the same universe are taken, they should give more or less the same
unit. Limitations of sampling
 It fails to provide information on individual account
 Sampling gives rise to certain errors
 Difficult to check for omissions of certain units
Designing and Conducting a Sampling Study
One of the objectives of sample survey is to estimate certain population parameters. A point to know is that
the true value of a population is parameter is unknown constant. It can be determined only by complete
study of the population. The concept of statistical inference comes in to play whenever this is impossible or
practically not feasible. A statistic which is sample based quantity must serve as our source of information
about the value of parameter. In this context, there are three crucial points.
 As the sample is only part of the population, the numerical value of a statistics is normally not expected
to give us the correct value of the parameter.
 Since different samples can be drawn from particular population, the observed value of the statistic
depends on the particular sample that is chosen.
 The value of statistic will have some variability over different occasions of sample.
The following table presents the population Parameter and the corresponding sample statistic
Population Parameters Sample Statistic
Population size N Sample size n
Population mean μ Sample mean X
Population standard deviation σ Sample standard deviation s
Population proportion P Sample proportion P
Bias and Errors in Sampling Survey
A sample is expected to mirror the population from which it comes; however, there is no guarantee that any
sample will be precisely representative of the population from which it comes. Chance may dictate that
a disproportionate number of untypical observations will be made like for the case of testing fuses, the
sample of fuses may consist of more or less faulty fuses than the real population proportion of faulty
cases.as a result there are two types of errors in sampling survey: Sampling error and non-sampling errors.

DBU Page 2 of 20
Dept of ACFN
BUSINESS STATISTICS CHAPTER ONE

Sampling error
Sampling error is the difference between the sample measure and the corresponding population measure
conducted using identical procedures due to the fact that the sample is not a perfect representation of
the population.
There are two basic causes for sampling error. One is chance: That is the error that occurs just because of
bad luck. This may result in untypical choices. Unusual units in a population do exist and there is always a
possibility that an abnormally large number of them will be chosen. The main protection against this kind of
error is to use a large enough sample. The second cause of sampling error is sampling bias.
Sampling bias is a tendency to favor the selection of units that have particular characteristics. Sampling
bias is usually the result of a poor sampling plan. Bias can be very costly and has to be guarded against as
much as possible. A means of selecting the units of analysis must be designed to avoid the more obvious
forms of bias. Another example would be where you would like to know the average income of some
community and you decide to use the telephone numbers to select a sample of the total population in a
locality where only the rich and middle class households have telephone lines. You will end up with high
average income, which will lead to the wrong policy decisions.
Non-sampling error also called systematic or measurement error
The other main cause of unrepresentative samples is non-sampling error. This type of error can occur
whether a census or a sample is being used. These errors are not due to chance fluctuations. Like
sampling error, non-sampling error may either be produced by participants in the statistical study or be an
innocent by product of the sampling plans and procedures. A non-sampling error is an error that results
solely from the manner in which the observations are made. Example of non-sampling error is inaccurate
measurements due to malfunctioning instruments or poor procedures and errors at different stages in
processing the data such as editing and tabulating of data.
Types of Samples and Sampling Techniques
There are two types sampling techniques: Probability and Non-Probability Sampling. A probability sampling
is one in which each member of the population has an equal chance of being selected. Whereas in a non-
probability sample, some people or things have a greater, but unknown, chance than others of selection.
Probability Sampling
A sampling technique in which every member of the population has a known, nonzero probability of
selection. In probability sampling, each member of the population has an equal probability of being
selected. To obtain samples that are unbiased i.e., that give each subject in the population an equally likely
chance of being selected statisticians use four basic methods of sampling:
Random
Systematic
Stratified and
Cluster sampling
1. Random sampling also called simple random sampling- A sampling procedure that assures each
element in the population of an equal chance of being included in the sample. Random samples are
selected by using chance methods or random numbers. It is standard against which other methods are
sometimes evaluated and suitable where population is relatively small and where sampling frame is
complete and up-to-date.

DBU Page 3 of 20
Dept of ACFN
BUSINESS STATISTICS CHAPTER ONE

2.Systematic sampling
A sampling procedure in which a starting point is selected by N/n=K th term and then continued selection Kth,
number on the list is selected. For example for a population of 500 and a sample of 100, the sampling
fraction is 1/5 i.e. you will select one person out of every five in the population. Random number needs
to be used only to decide on starting point. With the sampling fraction of 1/5, the starting point must be
within the first 5 people in your list
Disadvantage: Effect of periodicity (bias caused by particular characteristics arising in the sampling frame
at regular units). An example of this would occur if you used a sampling frame of adult residents in an area
composed of predominantly couples or young families. If this list was arranged: Husband / Wife / Husband /
Wife etc. and if every tenth person was to be interviewed, there would be an increased chance of males
being selected.
3.Stratified Sampling
A probability sampling procedure in which simple random subsamples that are more or less equal on some
characteristic are drawn from within each stratum of the population. Researchers obtain stratified samples
by dividing the population into non-overlapping groups called strata according to some characteristic such
as of age, sex, occupations, education, religion, region and etc that is important to the study, then sampling
from each group by a simple random sample method. The process of making strata is called
stratification and the specific class of strata is referred to as stratum. Thus a population can be stratified
if they have readily identifiable.
The advantages of using stratified random sampling are:
 It more accurately reflects the characteristics of the population than simple random sampling &
systematic random sampling.
 It is more cost effective than simple random sampling.
A sample drawn by stratified sampling can be proportional or disproportional stratified sample.
Proportional stratified sample: - is a stratified sample in which the number of sampling units drawn from
each stratum is in proportion to the population size of that stratum.
Disproportional stratified sample:- is a stratified sample in which the sample size for each stratum is
allocated according to analytical considerations.
4.Cluster sampling
An economically efficient sampling technique in which the primary sampling unit is not the individual
element in the population but a large cluster of elements; clusters are selected randomly. It involves
drawing several different samples from large population which have heterogeneity characteristics so as to
minimize the cost of final interviewing. Here the population is divided into groups called clusters Basic
procedure: First draw sample of areas. Initially large areas selected then progressively smaller areas within
larger area are sampled. Eventually end up with sample of households and use method of selecting
individuals from these selected households. The area sample is the most popular type of cluster sample
and Cluster samples frequently are used when lists of the sample population are not available.
A problem may arise with cluster sampling if the characteristics and attitudes of the elements within the
cluster are too similar and this problem may be mitigated by constructing clusters composed of diverse
elements and by selecting a large number of sampled clusters.

DBU Page 4 of 20
Dept of ACFN
BUSINESS STATISTICS CHAPTER ONE

Non-Probability Samples
A sampling technique in which units of the sample are selected on the basis of personal judgment or
convenience; the probability of any particular member of the population has greater unknown chance than
others of selection.
Non-probability sampling technique is suitable when there is not a complete sampling frame available
for certain groups of the population and when there heterogeneity in the population under the study in
which cluster method is ineffective and finally another factor to bear in mind is that many of the
probability sampling methods described above may mean that researchers would have to undertake a
postal or telephone survey delivery or might be expected to go from house to house. We will discuss
some of the problems of low response rate later on in this workbook, but you might find that a probability
sample with a poor response rate doesn't in the end give you a particularly good representation of the
population being examined.
Advantages of non-probability methods:
Cheaper
Used when sampling frame is not available
Useful when population is so widely dispersed that cluster sampling would not be efficient
Often used in exploratory studies, e.g. for hypothesis generation
1.Convenience sampling
The sampling procedure of obtaining those people or units those are most conveniently available. The
sample comprises subjects who are simply available in a convenient way to the researcher. There is no
randomness and the likelihood of bias is high. You can't draw any meaningful conclusions from the results
you obtain. However, this method is often the only feasible one, particularly for students or others with
restricted time and resources, and can legitimately be used provided its limitations are clearly
understood and stated. Researchers generally use convenience samples to obtain a large number of
completed questionnaires quickly and economically, or when obtaining a sample through other means is
impractical.
2.Judgmental (Purposive) Sampling
A nonprobability sampling technique in which an experienced individual selects the sample based on
personal judgment about some appropriate characteristic of the sample member. Researchers select
samples based on their judgment that satisfy their specific purposes, even if they are not fully
representative. It is so subjective and often used in political polling to forecast election results in districts
chosen because their pattern has in the past provided good idea of outcomes for whole electorate.
3.Quota Sampling
A nonprobability sampling procedure that ensures that various subgroups of a population will be
represented on pertinent characteristics to the exact extent that the investigator desires. It different from
stratified sampling in which the interviewer is responsible for finding enough people to meet the quota and
the various interview quotas yields a sample that represents the desired proportion of each subgroup.
Quota sampling involves the fixation of certain quotas, which are to be fulfilled by the interviewers and is
often used in market research.
The major advantages of quota sampling over probability sampling are speed of data collection, lower
costs, and convenience and its disadvantage is interviewers choose who they like (within above criteria)

DBU Page 5 of 20
Dept of ACFN
BUSINESS STATISTICS CHAPTER ONE
and may therefore select those who are easiest to interview, so bias can result because not random.

DBU Page 6 of 20
Dept of ACFN
BUSINESS STATISTICS CHAPTER ONE

4.Snowball sampling
A sampling procedure in which initial respondents are selected by probability methods and additional
respondents are obtained from information provided by the initial respondents. With this approach, you
initially contact a few potential respondents and then ask them whether they know of anybody with the
same characteristics that you are looking for in your research. For example, if you wanted to interview a
sample of vegetarians / cyclists / people with a particular disability / people who support a particular political
party etc., your initial contacts may well have knowledge (through e.g. support group) of others.
5. Self-selection
Self-selection is perhaps self-explanatory. Respondents themselves decide that they would like to take part
in your survey.
SAMPLING DISTRIBUTIONS
Basic Definitions
A sampling distribution is a probability distribution for the possible values of a sample statistic, such as a
sample mean.
The standard normal distribution is a continuous, symmetric, bell-shaped distribution of a Variable with
a mean of 0 and a standard deviation of 1 used to determine probabilities for the normally distributed
individual measurements, given the mean and the standard deviation. Symbolically, the variable is the
measurement X, with the population mean µ and population standard deviation . In contrast to such
distributions of individual measurements, a sampling distribution is a probability distribution for the possible
values of a sample statistic.
Sampling Distribution of the Mean
The sampling distribution of the mean is the probability distributions of the means, X of all simple random
samples of a given sample size n that can be drawn from the population.
NB: the sampling distribution of the mean is not the sample distribution, which is the distribution of the
measured values of X in one random sample. Rather, the sampling distribution of the mean is the
probability distribution for X , the sample mean.
For any given sample size n taken from a population with mean µ and standard deviation  , the value of
the sample mean X would vary from sample to sample if several random samples were obtained from the
population. This variability serves as the basis for sampling distribution.
The sampling distribution of the mean is described by two parameters: the expected value ( X ) = X , or
mean of the sampling distribution of the mean, and the standard deviation of the mean x, the standard
error of the mean.
Properties of the Sampling Distribution of Means
1. The mean of the sampling distribution of the means is equal to the population mean. µ =  =X X .
2. the standard deviation of the sampling distribution of the means (standard error) is equal to the
population standard deviation divided by the square root of the sample size:  x = /√n. This hold true
 Nn
if and only of n<0.05N and N is very large. If N is finite and n≥ 0.05N, x  * . The
n N 1

DBU Page 7 of 20
Dept of ACFN
Nn
expression is called finite population correction factor/finite population multiplier. In the
N 1
calculation of the standard error of the mean, if the population standard deviation  is unknown, the
standard error of the mean , can be estimated by using the sample standard error of the mean S
x X

which is calculated as follows:


SNn
SX  S or S  n * N 1 .
n X
3. The sampling distribution of means is approximately normal for sufficiently large sample sizes (n≥ 30).
Example:
A population consists of the following ages: 10, 20, 30, 40, and 50. A random sample of three is to be
selected from this population and mean computed. Develop the sampling distribution of the mean.
Solution:
The number of simple random samples of size n that can be drawn without replacement from a population
of size N is NCn. With N=5 and n=3, 5C3 = 10 samples can be drawn from the population as:
Sampled items Sample means ( X )
10, 20, 30 20.00
10, 20, 40, 23.33
10, 20, 50 26.67
10, 30, 40 26.67
10, 30, 50 30.00
10, 40, 50 33.33
20, 30, 40 30.00
20, 30, 50 33.33
20, 40, 50 36.67
30, 40, 50 40.00
300.00
A systematic organization of the above figures gives the following:
Sample mean ( X ) Frequency Prob. (relative freq.) of X
20.00 1 0.1
23.33 1 0.1
26.67 2 0.2
30.00 2 0.2
33.33 2 0.2
36.67 1 0.1
40.00 1 0.1
10.00 1.00
Columns 1 and 2 show frequency distribution of sample means.
Columns 1 and 3 show sampling distribution of the mean.


 X

N

 x  30,
Regardless of the

sample size   X .
n

 X i X

1000
 14.142.
N 5

2

N  n 14.142 5
X  * N   5.774
n 3 3
*

1 5
 Xi X 333.4 1
 5.774

 2 n  10

Since averaging reduces variability  x <  except the cases where  = 0 and n = 1.
Central Limit Theorem and the Sampling Distribution of the Mean
The Central Limit Theorem (CLT) states that:
1. If the population is normally distributed, the distribution of sample means is normal regardless of the
sample size.
2. If the population from which samples are taken is not normal, the distribution of sample means will be
approximately normal if the sample size (n) is sufficiently large (n ≥ 30). The larger the sample size is
used, the closer the sampling distribution is to the normal curve.
The relationship between the shape of the population distribution and the shape of the sampling
distribution of the mean is called the Central Limit Theorem.
The significance of the Central Limit Theorem is that it permits us to use sample statistics to make
inference about population parameters with out knowing anything about the shape of the frequency
distribution of that population other than what we can get from the sample. It also permits us to use the
normal distribution (curve for analyzing distributions whose shape is unknown. It creates the potential
for applying the normal distribution to many problems when the sample is sufficiently large.
Example:
1.The distribution of annual earnings of all bank tellers with five years of experience is skewed
negatively with a mean of Birr 15,000 and a standard deviation of Birr 2000. If we draw a random
sample of 30 tellers, what is the probability that their earnings will average more than Birr 15,750
annually?
Solution:
Steps:
1.Calculate µ and  x
µ = Birr 15,000 and  x=  /√n= 2000/√30 = Birr 365.15

2. Calculate Z for X
X
Z  X X  15,750  15,000
 Z15,750    2.05
X
X X 365
3.Find the area covered by the interval
P ( X > 15,750) = P (Z > +2.05)
= 0.5 - P (0 to +2.05)
= 0.5 – 0.4798
= 0.0202
4.Interpret the results
There is a 2.02% chance that the average earning being more than Birr 15, 750 annually in a group of
30 tellers.
2. Suppose that during any hour in a large department store, the average number of shoppers is 448, with
a standard deviation of 21 shoppers. What is the probability of randomly selecting 49 different
shopping hours, counting the shoppers, and having the sample mean fall between 441 and 446 shoppers,
inclusive? Solution:
1st. Calculate µ and  x
µ = 448 shoppers
 =x  /√n= 21/√49 = 3
2nd. Calculate Z for X
X
Z  X X 

X
X X

Z441
441 448 446  448
 3  2.33 and Z 446  3  0.67

3rd. Find the area covered by the interval


P (441 ≤ X ≤ 15,750) = P (-2.33 ≤ Z≤ -0.67)
= P (0 to -2.33) - P (0 to - 0.67)
= 0.4901 – 0.2486
= 0.2415
4 . Interpret the results
th

There is a 24.15% chance of randomly selecting 49 hourly periods for which the sample mean falls
between 441 and 446 shoppers.
3. A production company’s 350 hourly employees average 37.6 year of age, with a standard deviation of 8.3
years. If a random sample of 45 hourly employees is taken, what is the probability that the sample will have
an average age of less than 40 years?
Solution:
1st. Calculate µ and  x
µ = 37.6 years n/N= 45/350 > 5%.......FPCF is needed

N  n =   8.3 * 350 
x  * 45 1.16
n N 1 x
45
350 1

2nd. Calculate Z for X


X 40  37.6
Z  X X  Z    2.07

X
X X 40
1.16
3rd . Find the area covered by the interval
P ( X < 40) = P (Z < +2.07)
= 0.5 + P (0 to +2.07)
= 0.5 + 0.4808
= 0.9808
4 . Interpret the results
th

There is a 98.08% chance of randomly selecting 45 hourly employees and their mean age be less than 40
years.
4. Suppose that a random sample size of 36 is being drawn from a population with a mean of 278. If 86% of
the time the sample mean is less than 280, what is the population standard deviation?
Solution:
µ = 278 n = 36 P ( X < 280) = 0.86  =?
(Z/P=0.36) = +1.08

ZX  X   X 
X

Z 280  278 36


280   1.85 
X

280  278 
 1.08 1.85 
 X
6
2   6 *1.85
 1.08
X 11.1
2
 1.85
X 
1.08
5. A teacher gives a test to a class containing several hundred students. It is known that the standard
deviation of the scores is about 12 points. A random sample of 36 scores is obtained.
a) What is the probability that the sample mean will differ from the population mean by more than 6
points?
b) What is the probability that the sample mean will be within 6 points of the population mean?
Solution:
a) n = 36  =12  12  12  2
X  n
36 6
P ( X > µ +6) + P ( X < µ - 6) =?
Z  6 6
  3 and Z 6    6   3
 
2 2
P ( X > µ +6) + P (Z> µ - 6) = P (Z > 3) + P (Z < - 3)
= [0.5 – P (0 to +3)] + [0.5 – P (0 to -3)]
= (0.5 – 0.4987) + (0.5 – 0.4987)
= 0.0013(2) = 0.0026
b) n = 36  =12 12
    12
X n 36  6  2
P (µ - 6≤ X ≤ µ + 6) = P (- 3≤ Z ≤ 3)
= P (0 to 3)*2
= 0.4987*2
= 0.9974
If the population standard deviation is 12, in a random sample of 36 scores there is a 99.74% chance of
getting a sample mean score to lie within 6 points of the population mean.
Sampling Distribution of Proportions ( P )
Some times in statistics it is important to know the proportion of a certain characteristic in a population.
That is, there are numerous problems in business where we want to know the proportion of items in a
population that possess a certain characteristic. For example,
- A quality control manager might want to know what proportions of products of an assembly line are
defective.
- A labor economist might want to know what proportion of the labor force is unemployed.
Whereas the mean is computed by averaging a set of values, the sample proportion is computed by
dividing the frequency that a given characteristic occurs in a sample by the number of items in the sample.
X
P Where P = sample proportions
n
X = number of items in a sample that possess the characteristic
n = number of items in the sample
Like other probability distribution, sampling distribution of the proportion is described by two parameters:
the mean of the sample proportions, E ( P ) and the standard deviation of the proportions, which is
P
called the standard error of the proportion.
Properties of Sampling distribution of P
1. As the sampling distribution of the mean does, the population proportion, P, is always equal to the
mean of the sample proportion, i.e., P = E ( P ).
Pq
2. The standard error of the proportion is equal to:   ,
P
n
Where P= population proportion
q=1–P
n = sample size.

Pq * N  n Nn
Or   , where N  1 = finite population correction factor.
P
n N 1
The finite population correction factor is not needed if n < 0.05N.
Central Limit Theorem (CLT) and Sampling distribution of P
How does a researcher use the sample proportion in analysis?
Answer: By applying the Central Limit Theorem. The CLT states that normal distribution approximates the
shape of the distribution of sample proportions if np and nq are greater than 5. Consequently we solve
problems involving sample proportions by using a normal distribution whose mean and standard deviation
are:

P,  Pq
P P n and Z P  P P
P

NB: The sampling distribution of p can be approximated by a normal distribution whenever the sample
size is large i.e., np and nq>5.
Example:
1. Suppose that 60% of the electrical contractors in a region use a particular brand of wire. What is the
probability of taking a random sample of size 120 from these electrical contractors and finding that 0.5
or less use that brand of wire?
Solution:
n = 120 P = 0.6 q = 0.4 P ( p < 0.5) =?
Steps:
1. Check that np and nq > 5
120*0.6 =72, and 120*0.4 = 48. Both are greater than 5.
2. Calculate  P
Pq 0.6 *
P  = 0.4  0.0477
n
3. Calculate Z for p 120

Zp  P  P Z 0.5  0.6
 0.0477   2.24
p 0.5

4. Find the area covered by the interval


P ( p < 0.5) = P (Z < -2.24)
= 0.5 - P (0 to -2.24)
= 0.5 – 0.4875
= 0.0125
5. Interpret the results
The probability of finding 50% or less of the contractors to use this particular brand of wire is very low
about1.25%, if we take a random sample of 120 contractors.
2. If 10% of a population of parts is defective, what is the probability of randomly selecting 80 parts and finding
that 12 or more are defective?
Solution:
0.10 *
n = 80 P  0.90  0.0335
P = 0.1 80
X = 12
p = X/n = 12/80 = 0.15
P ( p > 0.15) =?

Z 0.15
0.15  0.1
 0.0335   1.49

P ( p > 0.15) = P (Z > + 1.49)


= 0.5 – P(0 to + 1.49)
= 0.5 – P (0 to + 1.49)
= 0.5 – 0.4319 = 0.0681
About 6.81% of the time, twelve or more defective parts would appear in a random sample of eighty
parts when the population proportion is 0.10.
3. Suppose that a population proportion is .40 and that 80% of the time you draw a random sample from this
population, you get a sample proportion of 0.35 or more. How large a sample were you taking?
Solution:
P= 0.4 P ( p > 0.35) = 0.80
n =?
Pq
(Z/P= 0.30) – 0.84 P 
n ; squaring both sides
0.35  0.4  .4 *.6 
Z 0.35  0.0595 =  
  n  2
p
 .4 *.6 
-0.84 = -0.05/  0.05952    0.0035 = 0.24/n
P  n 

0.84
 P = 0.05 0.0035 = 0.24/n
 = 0.05/0.84
P
n = 0.24/0.0035
= 0.0595 n = 68
4. If a population proportion is 0.28 and if the sample size is 140, 30% of the time the sample proportion
will be less than what value if you are taking random samples?
Solution:
P= 0.28 (Z/P = 0.2) = - 0.52
n = 140 X =?
P ( p < X) = 0.30

PP Pq
ZP  P 
p n

P  0.28 0.28* 0.72


 0.52  0.0379  140
 0.0379
 0.0197  P  0.28
P  0.26
Sampling Distribution of the Difference between Two Independent Sample Means X 1 X 2  
This distribution is concerned with finding the difference between sample means drawn from
two populations; it is interested in determining if the mean of one population is equal to the
mean of another.
N.B: In sampling distribution of the difference between two means X 1  X 2 , we are actually  
concerned with five different probability distributions.
1st, Two distribution of the two populations, which have means and variances of µ µ 2 and
1

 & respectively.
1
2
2
2

2nd, Two sampling distributions of


x1 and with µ x1,  x1 and µ 2 = x2, x2 .
x2 =

1

3rd, One sampling distribution of x  x  1


 with µ
2 1
- µ , and
2 X1 2
.

The sampling distribution of the difference between two sample means X 1 X   is described by
2
two parameters.
1. Mean of the difference between two sample means, µ - µ 2
1

2. Standard error of the difference between two sample means,


X 1 X =2  X2  X

1 2

=
2
Variance X  X  12   22   2
  2

n n
=
1 2 X1  X 2 X1 X 2
1 2

[If X and Y are independent random variables: var (X-Y) = var (X) + var (Y)
Where,  12 = variance of population one
 22 = variance of population two
n1 = sample size drawn from population one
n2 = sample size drawn from population two
If more than 5% of the population is sampled without replacement, we must apply the finite
population correction factor and the formula becomes:
BUSINESS STATISTICS CHAPTER ONE

2  2
N  n   2
N  n 
 = n *
1 1 1
   2 * 2
 2
X1  X  1 N1  1 n
 2 N  1 
2
2

The Central Limit Theorem and the sampling distribution of X 1 X  
2
The central limit theorem states that:
If n1 and n2 are greater than 30, the distribution of X1  X  2  will be approximately normal no
matter how the original populations are distributed
If the original populations are normally distributed, then the distribution of X1  X 2 is exactly  
normally distributed for any values of n1 and n2. This means the sum or difference of
independent normal variables is normally distributed.

To standardize a difference between two sample means X  X we use the following formula: 
X 
1 2

1  X 2   1   2 
Z X1  X 2 
X1  X 2

Example:
1 A financial loan officer claims that the mean monthly payment for credit cards is Br 80 with a
variance of 1400 for single females and Br 80 with a variance of 1320 for single males. You take a
random sample of 100 females (population 1) and an independent random sample of 120
(population 2). What is the probability that the sample mean for females will be at least Br 5 higher
than the sample mean for males?
Solution:
1  80 2  80 12  1400 22  n1 = 100 n2 = 120
 1320
 
p x1  x2  5  ?

Е X  X  = µ  X X  12  22
-µ2 = 
1 2 1 1 n n2
1

= 80 – 80 = 1400 1320

100 120
= 0 = 5
X  X   2 1  2
ZX 
1
X X
1 X2 1 2

Z 5  5  0  1.0
5
 
p x1  x2  5  P(Z  1.0)
= 0.5 – P (0 to +1.00)

DBU Dept of ACFN Page 20 of 20


BUSINESS STATISTICS CHAPTER ONE
= 0.5 - 0.3413 = 0.1587
There is a 15.87% chance that the mean monthly payment for credit cards for single females will
be higher than that of single males by at least Birr 5

DBU Dept of ACFN Page 21 of 20


BUSINESS STATISTICS CHAPTER ONE

2 MOHA soft drinks factory produces two soft drinks: 7 up and Pepsi-cola. The daily production of 7
up averages 1  bottles and is normally distributed with a standard deviation
15,000
2  12,500 bottles and standard 2  bottles. A sample of five randomly
deviation 2500
selected daily production figures is taken from each of the plants. What is the probability that the
sample mean production for 7 up will be less than or equal to the sample mean production for
Pepsi-cola?
Solution:
1 15,000bottles 2  1  2,000 2  2,500bottles n1 = 5
n2 = 5 12,500bottles

p x1  x2  0  ? 

Е X X
1 2 =µ -µ2  X X =  12

 22
1 1
n n2
= 15,000 – 12,500 1

= 2,500 =
20002 
2,500
5 2 5
= 1,431.78
X  X  
2 1
 2 
ZX  1  X X
1 X2 1 2

Z 0  0  2,500
1431.78  1.75
 
p x1  x2  0  P(Z  1.75)
= 0.5 – P (0 to -1.75)
= 0.5 - 0.4599
= 0.0401
Thus, there is only a 4.01% chance that the mean productivity for 7UP will be smaller than the
mean productivity for Pepsi-cola. So, if the owner of the two plants found a smaller first sample
mean, say x2  13,500bottles, in independent random samples of five randomly selected days
from each plant, he would suspect that either the sampling was faulty or that the difference in the
plant’s mean daily outputs had changed.

DBU Dept of ACFN Page 22 of 20


3 X company claims that the mean annual repair bill for its rental cars is Br 290 and the standard
deviation is Br. 50. Y Company also claims its mean annual repair bill is Br 290 and the standard
deviation is Br. 50. If independent random samples of 100 cars from each company are obtained,
what is the probability that x  x2 exceeds Br 5?
1
Solution:
1  290  2  290 1  2  n1 = 100 n2 = 100

p X1  X 2  5  ?  50 50


Е X X
=µ -µ2  X X =
1 2 1  12  22
1

n n2
= 290 – 290
1

=0 = 502 
502
= 7.071
X  X  
100 100
2 1
 2 
ZX  1  X X
1 X2 1 2

Z5  5  0
7.071  0.711.75
 
p x1  x 2  0  P(Z  0.71)*2
= [0.5 – P (0 to 0.71)]2
= [0.5 - 0.2611]2
= 0.4778
There is 47.78% chance that the difference between the mean annual repair bill for X and Y
companies exceed Br 5.
4 Two population of measurement are normally distributed with 1  57 and
2  25. The two populations standard deviations are 1  12 and 2  6 . Two independent
samples of n1 = n2 = 36 are taken from the populations.
a. What is the expected value of the difference in sample means, x1  x2 ?
b. What is the standard deviation of the distribution of x1  x2 ?
c. What is the shape of the distribution of x1  x2 ? How do you know?
Solution:
a. Expected value of the difference in same means   x1  x2  1   = 57-25 = 32
2
b. The standard deviation of the distribution x1  x2 = Standard error of the difference
of
between two sample means:
 X X =
 12

 22 = 122 (6) 2
 = 2.24
1
n n2 36 36
1
c. The shape of the distribution of
x1  x2 is normal because both n1 & n2 are greater than 30.
5. Two production processes are, on the average, identical. Both use an average of
1  2  500kg, of raw material per day. Both have the same standard deviation of daily
use, 1   2  9 kgs per day. Thus the daily use of material may vary for the two processes,
but on the average they are the same. Find the probability that differ by no more than 1.0 kg.
Solution:
1  500  2  500 1 9 2  n1 = 81 n2 = 36
9

 
p X 1  X 2  1  P(1  X 1  X 2  1)  ?


Е X X
=µ -µ2  X X  12  22
1 2 1 1
= 
n n2
= 500 – 500 1

=0 =
9 2
 9 
2

81 36
= 1.8028
X  X2
X
  1
 2  10
Z1  1.8028  0.55
ZX  1
1 X 2
1 X2

p x1 
x2  1  P(Z  0.55) * 2

= [P (0 to 0.71)] 2

= [0.2611] 2
= 0.5222
Sampling Distribution of the Difference between Two Sample Proportions P  P  1

2

Frequently we are interested in determining if the proportion of items in one population that
possesses a certain characteristic is the same as the proportion possessing the characteristic in
another population. For example, a doctor who gives one type of medicine to some patients and
another medicine to others may want to determine if the percentage of people cured by the first
medicine is the same as the percentage of people cured by the second. For this and other similar
cases the sampling distribution of the difference between sample proportions is used. It is also
used to measure the proportion of market share and proportion of vote.
Suppose we take independent samples of size n 1 and n2 from two populations. Let p1 and p2 be
the proportions of items in each population that possesses a certain characteristic, and let
q1  1 p1  and q2  1 p2 .
If n1p1, n1q1 are greater than 5 and n2p2, n2q2 are greater than 5, then the random variable
 
p  p is approximately normally distributed with
  =P
1 2
Mean: Е p  p 1 – P2; and
1 2

Variance: Var p  p
n1
 =P q 1 1

P2 q2
; if  0.05 or
n2
, finite population correction factor
n n N
1 2
1 2 1 N2
is used.
To standardize a difference between two sample proportions p  p we use the following  
formula: 1 2

P  P  P  P 
1
Z P1 P2
 2
P
1 2

1 P2

Example:
1 At Addis Ababa University there is a movement to re-establish the Students’ Union. Approximately
90% of the entire students favor the reinstatement. A pro union student takes a random sample of
100 students. An anti-union student takes an independent random sample of 100 students. Let
p1 denote the proportion of student who favor union in a sample taken by the pro-union student
and
p2 denote the proportion of students who favor the union in the sample taken by an anti-union
student. Calculate the probability that
p1 exceeds by 0.1 or more.
Solution: p2
Pro-union Anti-union
P1 = 0.9 P2 = 0.9
q1 = 0.1 q2 = 0.1
n1 = 100 n2 = 100
 
P p1  p2  0.10  ?

Е p  p  = P - P  p p = p1 q1 p2

1 2 1 2 1 2 qn
2
n2
1
= 0.9 – 0.9

= 0 =
0.9 * 0.1  0.9 *
100 100
= 0.10.04243
p 
 p 2  P1  P2 
Z0.1 
0.1  0
 2.36
Zp p  1  p p 0.04243
1 2
1 2

 
P p1  p2  0.10  P(Z  2.36)
= 0.5 - P (0 to +2.36)
= 0.5 – 0.4909
= 0.0091
2 A TV channel airs two talk shows: Talk-show 1 and Talk-show 2. On a Sunday afternoon, a
random sample of 400 people is taken to estimate p1 , the proportion of the population that
watched the Talk show 1 on the TV channel. On the following Sunday, an independent random
sample of 400 people is taken to estimate p2 , the proportion of the population who watched the
Talk show 2 on the TV channel. If p1 = 0.6 and p2 =0.5 find the probability, that p  p in our
2 1

samples. That is, find p p  p  0 .  1 2



Solution:
Talk-show 1 Talk-show2
P1 = 0.6 P2 = 0.5
q1 = 0.4 q2 = 0.5
n1 = 400 n2 = 400
 
P p1  p 2  0  ?

Е p  p  = P - P  p p = p1 q1 p2

1 2 1 2 1 2 qn
2
n2
1
= 0.6 – 0.5

0.6 * 0.4  0.5 *


400 400
0.5
= 0.10 = 0.035

=
p 
 p2  P1  P2 
Z0.1 
0  0.1
 2.86
Zp  1  p p 0.035
1 2
1 p2

 
P p1  p2  0  P(Z  2.86)
= 0.5 + P (0 to -2.86)
= 0.5 + 0.4979
= 0.9979

You might also like