Chap1 B Stat
Chap1 B Stat
Chap1 B Stat
CHAPTER 1
Sampling and sampling distribution
Finite Population
Statisticians recommend selecting a probability sample when sampling from a
finite population because a probability sample allows them to make valid
statistical inferences about the population. The simplest type of probability
sample is one in which each sample of size n has the same probability of being
selected. It is called a simple random sample.
Infinite Population
Sometimes we want to select a sample from a population, but the population is
infinitely large or the elements of the population are being generated by an on-
going process for which there is no limit on the number of elements that can be
generated. Thus, it is not possible to develop a list of all the elements in the
population. We cannot select a simple random sample because we cannot
construct a frame consisting of all the elements. Statisticians recommend
selecting what is called a random sample.
A random sample of size n from an infinite population is a sample selected
such that the following conditions are satisfied.
Each element selected comes from the same population.
Each element is selected independently.
Situations involving sampling from an infinite population are usually
associated with a process that operates over time. Examples include parts
being manufactured on a production line, repeated experimental trials in a
laboratory, transactions occurring at a bank, telephone calls arriving at a
technical support center, and customers entering a retail store. In each case,
the situation may be viewed as a process that generates elements from an
infinite population. As long as the sampled elements are selected from the
same population (i.e., must be selected at approximately the same point in
time) and are selected independently (i.e., selecting of one element must not
affect the next selection), the sample is considered a random sample from an
infinite population.
Systematic Sampling
A complete list of all elements within the population (sampling frame) is
required. The items or individuals of the population are arranged in some
order. In a systematic sampling, a random starting point is selected using a
simple random sampling method and then every kth member of the population
is selected.
Step 3: Starting with this number select every kth number until all the n
units are selected. The jth unit is selected at first and then (j + k)th, (j +
2k)th,..., etc until the required sample size is reached.
Stratified Random Sampling
When a population can be clearly divided into groups based on some
characteristic, and then stratified random sampling can be used to guarantee
that each group is represented in the sample. The groups are also called
Strata. Once the strata are defined, we can apply simple random sampling
within each group or strata to collect the sample. Elements in the same strata
should be more or less homogeneous while different in different strata. It is
applied if the population is heterogeneous.
Some of the criteria for dividing a population into strata are: Sex (male, female);
Age (under 18, 18 to 28, 29 to 39); Occupation (blue-collar, white collar,
others).
Example 1: Population has 25 students of whom 15 are white and 10 black. A
stratified sample of size 10 should have how many whites / blacks? Let
N=population size, N1=blacks, N2=whites, n=sample size.
N1 /N = (10/25)*10 or 4 blacks and (N2/N)*n= (15/25)*10 or 6 we have a
representative sample
Cluster Sampling
Another common type of sampling is cluster sampling. It is often employed to
reduce the cost of sampling a population scattered over a large geographic
area. In cluster sampling, a population is divided into clusters using naturally
occurring geographic or other boundaries. Then, clusters are randomly selected
and a sample is collected by randomly selecting from each cluster.
Clusters are formed in a way that elements within a cluster are heterogeneous,
i.e. observations in each cluster should be more or less dissimilar. Cluster
sampling is useful when it is difficult or costly to generate a simple random
sample. For example, to estimate the average annual household income in a
large city we use cluster sampling, because to use simple random sampling we
need a complete list of households in the city from which to sample.
To use stratified random sampling, we would again need the list of households.
A less expensive way is to let each block within the city represent a cluster. A
sample of clusters could then be randomly selected, and every household
within these clusters could be interviewed to find the average annual
household income.
Purposive sampling may be useful when the sample is small; but as the sample
size increases the estimates become unreliable due to accumulation of bias.
The advantage of purposive sampling is that whereas a random sample may
vary widely from the average, a purposive sample will not.
Convenience Sampling
In this method, the decision maker selects a sample from the population in a
manner that is relatively easy and convenient. The researcher samples
whatever units come most readily to hand. In this case, probability is not used
in the sampling at all.
Quota Sampling
In this method, the decision maker requires the sample to contain a certain
number of items with a given characteristic. Many political polls are, in part,
Example: Suppose in an opinion study, you want both men and women to
participate. You know that in the population category of interest, 65% are men
and 35 % are women. If your sample size is fixed at 200, you will have a quota
of 130 men and 70 women.
e)
Then the standard deviation of the distribution of the sample mean is equal
to the population standard deviation divided by the square root of the
sample size.
σ N n
δx = Otherwise, X
n n N 1
Note that as we increase the size of the sample, the spread of the distribution
of the sample mean becomes smaller.
Example: The standard deviation of annual salary for the population of 2500
managers is σ = 4000. In this case, the population is finite, with N = 2500. With
a sample size of 30, compute the standard deviation of the sample mean.
Solution: we have n/N = 30/2500 = .012. Because the sample size is less than
5% of the population size, we can ignore the finite population correction factor
and use the infinite population factor.
δx =
4000
30
= 730.3
Take the same example above and if the sample size is 1000 instead of 30,
compute the standard deviation of the sample mean.
Solution: we have n/N = 1000/2500 = 0.4. Because the sample size is more
than 5% of the population size, we can use the finite population correction
factor.
4000 2500 1000
X
1000 2500 1
= 98
Form of the Sampling Distribution of X
The preceding results concerning the expected value and standard deviation for
the sampling distribution ofX are applicable for any population. The final step
in identifying the characteristics of the sampling distribution of X is to
determine the form or shape of the sampling distribution. We will consider two
cases: (1) The population has a normal distribution; and (2) the population
does not have a normal distribution.
Population has a normal distribution. In many situations it is reasonable to
assume that the population from which we are selecting a random sample has
a normal, or nearly normal, distribution. When the population has a normal
distribution, the sampling distribution of X is normally distributed for any
sample size.
Population does not have a normal distribution. When the population from
which we are selecting a random sample does not have a normal distribution,
the central limit theorem is helpful in identifying the shape of the sampling
distribution of X . A statement of the central limit theorem as it applies to the
sampling distribution of X follows.
Central Limit Theorem: - In selecting random samples of size n from a
population, the sampling distribution of the sample mean can be approximated
by a normal distribution as the sample size becomes large.
and finite variance ,
2
Given a population of any functional form with mean
the sampling distribution of X , computed from samples of size n from the
population will be approximately normally distributed with mean and
2
variance , when the sample size is large.
n
From a practitioner standpoint, we often want to know how large the sample
size needs to be before the central limit theorem applies and we can assume
that the shape of the sampling distribution is approximately normal. Statistical
researchers have investigated this question by studying the sampling
distribution of for a variety of populations and a variety of sample sizes.
General statistical practice is to assume that, for most applications, the
sampling distribution of x¯ can be approximated by a normal distribution
whenever the sample is size 30 or more.
Example- The distribution of annual earnings of all bank tellers with five years
of experience is skewed negatively. This distribution has a mean of Birr 15,000
and a standard deviation of Birr 2000. If we draw a random sample of 30
tellers, what is the probability that their earnings will average more than Birr
15,750 annually?
Solution:
Steps:
1. Calculate µ and x
µ = Birr 15,000
x = δ/√n= 2000/√30 = Birr 365.15
2. Calculate Z for X
X X X
ZX
X X
15,750 15,000
Z15, 750 2.05
365
Standard deviation of p
Just like the standard deviation of mean, the standard deviation of p depends
on whether the population is finite or infinite. The two formulas for computing
the standard deviation of p are as follows:
Example: Let’s take the previous situation again. The population proportion of
managers who participated in the management training program is p = .60 with
sample size of 30 and population of 2500.
n/N = 30/2500 = 0.012, we can ignore the finite population correction factor
when we compute the standard error of the proportion. For the simple random
sample of 30 managers, is
To determine whether the sample size is large enough, it must satisfy the
following two conditions: np ≥ 5 and n (1 - p) ≥ 5. The sampling distribution of
p can be approximated by a normal distribution whenever np ≥ 5 and n (1 - p)
≥ 5.
Exercises
1. The mean undergraduate cost for tuition, fees, room, and board for four-
year institutions was $26,489 for a recent academic year. Suppose that
standard deviation of $3204 and that of 36 four-year institutions are
randomly selected. Find the probability that the sample mean cost for these
36 schools is
A. What is the probability that the sample mean is between 7.8 and 8.2
minutes?
B. What is the probability that the sample mean is between 7.5 and 8
minutes?
C. If you select a random sample of 100 sessions, what is the probability
that the sample mean is between 7.8 and 8.2 minutes?
3. Suppose that during any hour in a large department store, the average
number of shoppers is 448, with a standard deviation of 21 shoppers. What
is the probability of randomly selecting 49 different shopping hours,
counting the shoppers, and having the sample mean fall between 441 and
446 shoppers, inclusive?
4. The U.S. Census Bureau announced that the median sales price of new
houses sold in 2009 was $215,600, and the mean sales price was $270,100.
Assume that the standard deviation of the prices is $90,000.
A. If you select a random sample of n=100 what is the probability that the
sample mean will be less than $300,000?
B. If you select a random sample of n= 100 what is the probability that the
sample mean will be between $275,000 and $290,000?
5. A population has a mean of 200 and a standard deviation of 50. Suppose a
simple random sample of size 100 is selected and is used to estimate μ.
A. What is the probability that the sample mean will be within +5 of the
population mean?
B. What is the probability that the sample mean will be within +10 of the
population mean?
6. A population proportion is .40. A simple random sample of size 200 will be
taken and the sample proportion will be used to estimate the population
proportion.
A. Compute the standard error of the proportion
B. What is the probability that the sample proportion will be within+.03 of
the population proportion?
C. What is the probability that the sample proportion will be within +.05 of
the population proportion?
7. In a recent survey of full-time female workers ages 22 to 35 years, 46% said
that they would rather give up some of their salary for more personal time.
Suppose you select a sample of 100 full-time female workers 22 to 35 years
old.
A. What is the probability that in the sample, fewer than 50% would rather
give up some of their salary for more personal time?
B. What is the probability that in the sample, between 40% and 50% would
rather give up some of their salary for more personal time?
C. What is the probability that in the sample, more than 40% would rather
give up some of their salary for more personal time?