Lecture 3: Sampling and Sample Distribution

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

Lecture 3 : Sampling and

LECTURE 3: SAMPLING AND SAMPLE DISTRIBUTION

Sample Distribution
I. Introduction to sampling distribution
II. Sampling distribution of the mean
III. Sampling distribution of proportion
SAMPLING DISTRIBUTION
There are three distinct types of distribution of data which are –
1.Population Distribution, characterizes the distribution of elements of
a population
2.Sample Distribution, characterizes the distribution of elements of a
sample drawn from a population
3.Sampling Distribution, describes the expected behavior of a large
number of simple random samples drawn from the same population.
Sampling distributions constitute the theoretical basis of statistical
inference and are of considerable importance in business decision-making.
Sampling distributions are important in statistics because they provide
a major simplification on the route to statistical inference.
DEFINITION
A sampling distribution is a theoretical probability distribution of a statistic
obtained through a large number of samples drawn from a specific population

A sampling distribution is a graph of a statistics (i.e. mean, mean absolute value


of the deviation from the mean,range,standard deviation of the sample, unbiased
estimate of variance, variance of the sample) for sample data.
Sampling distribution is a theoretical distribution of an infinite number of
sample means of equal size taken from a population .
Usually a univariate
distribution.
Closely approximate a normal
distribution.

Sample statistic is a random


sample mean , sample & proportion
variable–
CHARACTERISTICS
A theoretical probability
distribution

The form of a sampling distribution


refers to the shape of the particular
curve that describes the
distribution.
FUNCTIONS OF SAMPLING DISTRIBUTION
Sampling distribution is a graph which perform several duties to show
data graphically.
Sampling distribution works for :
Mean
Mean absolute value of the deviation from the mean Range
Standard deviation of the sample
Unbiased estimate of the sample
Variance of the sample
WHY SAMPLING DISTRIBUTION IS
IMPORTANT?

SELECTION OF
PROPERTIES OF HYPOTHESIS
DISTRIBUTIO TYPE
STATISTICS TO MODEL TESTING
SCORE
i)Properties of Statistic :
Statistic have different properties as estimators of a population
parameters. The sampling distribution of a statistic provides a
window into some of the important properties. For example if the
expected value of a statistic is equal to the expected value of the
corresponding population parameter, the statistic is said to be
unbiased
Consistency is another valuable property to have in
estimation of a population parameter, as the statistic with the smallest
standard error is preferred as an estimator estimator A statistic used to
estimate a model parameter.of the corresponding population
parameter, everything else being equal.
ii) Selection of distribution type to model scores :
The sampling distribution provides the theoretical foundation to select a
distribution for many useful measures. For example, the central limit
theorem describes why a measure, such as intelligence, that may be considered a
summation of a number of independent quantities would necessarily be
distributed as a normal (Gaussian) curve.

iii) Hypothesis Testing :


The sampling distribution is integral to the hypothesis testing procedure. The sampling
distribution is used in hypothesis testing to create a model of what the world would look
like given the null hypothesis was true and a statistic was collected an infinite number of
times. A single sample is taken, the sample
is calculated, and then it is compared to the model created by the sampling distribution of
that statistic when the null hypothesis is true. If the sample statistic is
unlikely given the model, then the model is rejected and a model with real effects is more
likely.
TYPES OF SAMPLING
DISTRIBUTION
The types of sampling distribution are as
follows:
1) Sampling Distribution of the Mean: Sampling
distribution of means of a population data is defined
as the theoretical probability distribution of the
sample means which are obtained by extracting all
the possible samples having the same size from the
given population.
Given a finite population with mean (m) and variance
(s2). When sampling from a normally distributed
population, it can be shown that the distribution of
the sample mean will have the following properties -
CENTRAL LIMIT THEOREM
The central limit theorem, first introduced by De Moivre during the early
eighteenth century, happens to be the most important theorem in statistics.
According to this theorem, if we select a large number of simple random samples,
for example, from any population distribution and determine the mean of each
sample, the distribution of these sample means will tend to be described by the
normal probability distribution with a mean µ and variance σ𝟐/n.
Or in other words, we can say that, the sampling distribution of sample means
approaches to a normal distribution.
Symbolically, the theorem can be explained as following :
When given n independent random variables
𝑋1,𝑋2,𝑋3,…..𝑋𝑛 which have the same distribution
( no matter what distribution),then :

X = 𝑋1 + 𝑋2 +𝑋3 + …𝑋𝑛

is a normal variate. The mean µ and variance 𝝈𝟐 of


X are

𝜇 = 𝜇1 + 𝜇2 + 𝜇3 + …. 𝜇𝑛
= 𝑛𝜇1
𝜎
where µ 𝑎
2= 𝟐
𝑛 𝜎 2 + 𝜎2 + 𝜎2 + …𝜎2
𝑑𝝈 are the mean2and variance of 𝑿𝟏
=
𝟏 𝟏 𝑛𝜎
1 2 3
𝑛 1
UTILITY :
The utility of this theory is that it requires virtually no conditions on
distribution patterns of the individual random variable being summed. As a
result, it furnishes a practical method of computing approximate probability
values associated with sums of arbitrarily distributed independent random
variables.
This theorem helps to explain why a vast number of phenomena show
approximately a normal distribution. Because of its theoretical and practical
significance, this theorem is considered as most remarkable theoretical
formulation of all probability laws.
However, most of hypothesis testing and sampling theory is based on this
theorem. So the central limit theorem is perhaps the most fundamental result
in all of statistics.
2) SAMPLING DISTRIBUTION OF THE
PROPORTION :

Sampling distribution of the proportion is found when the sample


proportion and proportion of successes are given.

Properties :

Sample proportion tend to target the value of proportion.

Under certain conditions, the distribution of sample proportion


can be approximated by a normal distribution
Example:
Sample distribution of the proportion
of the girls from sample space for
two randomly selected
births:bb,bg,gb,gg All four outcomes
are equally likely:
Probabilities:
P(0 girls)=0.25
P(1 girl)=0.50
P(2girls)=0.7
5
STANDARD ERROR OF THE SAMPLING DISTRIBUTION
The sampling distribution has a standard deviation. The mean of the
sampling distribution will be the same as the population mean, but the standard
deviation will be smaller than the Population Standard Deviation.
The standard deviation of the sampling distri bution has a special name :
‘The Standard Error ’ or sometimes ‘The Standard Error of the Mean .
The variation of sample mean around the population mean is the sampling
error and is measured using a statistic known as the standard error of the mean. This is
an estimate of the amount that a sample mean is likely to differ from the population
mean. This consideration is important because sampling theory tells us that 68% of all
sample means will lie between + or – one standard error from the population mean.
And that 95 % of all sample mean will lie between + or – 1.96 standard errors from the
population mean
Formula
: The standard error of a sampling distribution is equal to the
standard deviation of the population divided by the square root of the sample
size. The formula of the standard error is as follows :
𝒙𝝈= σ/ √𝑵
Here,
𝑥𝜎 = Standard deviation of sample mean .
𝜎 = Standard deviation of population .
𝑁= Total Population .
How to reduce Error :
When sample size increases, sampling error decreases .
Purpose :
1. Allows us to quantify the extent to which a ‘test’ provides accurate
scores.
2. If the standard error is smaller,the range of population mean
will be narrower.
3. When standard error is larger, the range of population mean
will be wider

Application :
95 % CI = Mean ± ( 1.96 × SEM )
99 % CI = Mean ± ( 2.58 × SEM )
STANDARD ERROR TABLE
SAMPLING STANDARD ERROR SAMPLING STANDARD ERROR
DISTRIBUTION DISTRIBUTION

𝜎 1.3626 𝜎
MEANS 𝜎𝑥 = √𝑁
FIRST & THIRD 𝜎𝑄1= 𝜎3𝑄 = 𝑁
QUARTILES

PROPORTIONS SEMI-INTERQUARTILE 0.7867 𝜎


𝜎𝑝=
𝑝 (1−𝑝)
=
𝑝𝑞 𝜎𝑄= 𝑁
𝑁 𝑁 RANGESS

𝜎
STANDARD DEVIATIONS 1. 𝜎𝑠= VARIANCES 2
1. 𝜎 2 = 𝜎
2𝑁 2
𝜇4− 𝜇2 𝑠 𝑁
2. 𝜎 = 2
𝜇4− 𝜇2 2
𝑠 4𝑁𝜇2 2. 𝜎𝑠2 =
𝑁
𝑣
MEDIANS COEFFICIENTS OF
𝜋 1.2533 𝜎 𝜎𝑣= 1+
𝜎 =σ = VARIATION 2𝑣22𝑁
𝑚𝑒
𝑑 2𝑁 √𝑁
Point & Interval Estimates

There are two kinds of estimates of population parameters from sample


statistics :

POINT INTERVAL
ESTIMATE ESTIMATE
S S

A point estimate is a single value and an interval estimate is a range of


values.
POINT ESTIMATION :
A point estimate of a population parameter is a single value of a statistic.

For example,
the sample mean ¯x is a point estimate of the population mean μ.
the sample proportion p is a point estimate of the population proportion
Similarly,
P.

Interval Estimation :
parameter
An intervalisestimate
said to lie.
is defined by two numbers, between which a
population

For example
a < x < b is an interval estimate of the population
mean μ. It indicates that the population mean is
greater than a but less than b.

In any estimation problem, we need to obtain both a


point estimate and an interval estimate. The point
estimate is our best guess of the true value of the
parameter, while the interval estimate gives a measure
of accuracy of that point estimate by providing an
interval that contains plausible values.
PROBLEMS
Sampling Distribution of means
Prob. 1 :
A population consists of the five numbers 2,3,6,8 and 11. Consider
all possible samples of size 2 that can be drawn with and without replacement
from this population .
a)The mean of the population.
b)The standard deviation of the population .
c)The mean of the sampling distribution of mean
d)Standard deviation of the sampling distribution of means (the standard
error of means ).
#
Answer :
2+3+6+8+11
a) Mean of the population = = =
5
30 5 6 2
⅀ 𝑥−𝜇
b)Standard deviation of population ,𝜎2=
𝑁
2+ (3−6)2+(6−6)2+(8−6)2+(11−6)2
= (2−6)
5
16+9+0+4+25
= 54 = =
5
5 10.8
∴ 𝜎= 3.29
With replacement :
c)There are 5(5)= 25 samples of size 2 that can be drawn with replacement. These
are :
(2,2) (2,3) (2,6) (2,8) (2,11)
(3,2) (3,3) (3,6) (3,8) (3,11)
(6,2) (6,3) (6,6) (6,8) (6,11)
(8,2) (8,3) (8,6) (8,8) (8,11)
The corresponding sample means are :
2.0 2.5 4.0 5.0
6.5
2.5 3.0 4.5 5.5 7.0
4.0 4.5 6.0 7.0 8.5.
5.0

5.5

7.0 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙


𝜇𝑥 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛𝑠
8.0 25 15
=0 = 6.0
2
9.5 5

6.5
Illustrating the fact that 𝜇𝑥 = µ
7.0

8.5
d) Here, standard deviation of the sampling distribution of mean is,
2−6 2+(2.5−6)2+ ………+ (11−6)2
𝜎2x = ( substracting the mean 6 from each numbers, squaring the
25
result, adding all 25 numbers thus obtained and dividing by 25 )
13
=5 =
2
5.40
σx = 55.40 =
2.32 𝜎2
2
This illustrates the fact that for finite populations involving sampling with 𝜎 x= -
𝑁
since the right, hand side is 10.8/2 = 5.40 ; agreeing with the above
replacement
value .

Without Replacement:
c) There are 10 samples of size 2 that can be drawn without replacement from the
population :

(2,3) (2,6) (2,8) (2,11) (3,6) (3,8) (3,11) (6,8) (6,11) (8,11)
The corresponding sample means are :
2.5, 4.0 , 5 , 0 , 6.5 , 4.5 , 5.5 , 7.0 , 7.0 , 8.5 ,
9.5 .
The mean of sampling distribution of means is ,
2.5+4.0+ …….…+9.5
𝜇𝑥 = =
6.0
10 ∴ 𝜇𝑥 = µ
(d) The variance of sampling distribution of mean is ,
(2.5−
6)2+ 4.0−6 2+ ……….+
= 4.05
𝜎2x = 1
(9.5−6) 2
0
And, 𝜎𝑥 = 2.01
𝜎2 𝑁𝑝− 𝑁
this illustrates, 𝜎2x = )
𝑁 𝑁𝑝−
( 1
10.8
= 5−2(
2
5−1)
= 4.05
As obtained above .
PROPORTIONS
Prob. 2 :
Find the probability that in 120 tosses of a fair
coin , a)Between 40 % and 60 % will be
heads and
b)5/8 or more will be heads .

Answer:

from infinite population of all possible tosses of the coin.


In this
population the probability of heads is p=1/2 and the
probability of tails is q= 1-p = ½
a) 𝜇 𝑝 = =
1
𝑝 0.50
2
=
1
1 (
2
𝜎𝑝 = √𝑝𝑞 = 2
= 0.0456
)
𝑁
√ 120
0.40−
40 % in standard units = 0.50
0.045
= -2.19
6
60 % in standard units = 0.60−
0.0
= 2.19
0.50
456
Required probability = (area under normal curve between z= -2.19 and z= 2.19 )
= 2 ( 0.4857 )
= 0.9714
Although this result is accurate to two significant figures, it does not agree exactly since we
nothave
used the fact that the proportion is actually a discrete variable. To account for this, we subtract
½ N = ½ (120) from 0.40 and add ½ N = ½ (120) to 0.60 ; thus, since 1/240 = 0.00417, the
proportions in standard units are,
required
0.40−0.004 0.60+0.004
17−0.50 = -2.28 and 17−0.50 = 2.28
0.0 0.0
456 456
b) According to (a) since 5/8 = 0.6250
(0.6250 – 0.00417 ) in standard units =
0.6250 – 0.00417 – 0.50

0.0456

 = 2.65
Required probability = ( area under normal curve to right of z=2.65 )
 =(area to right of z = 0) – (area between z=0 and z= 2.65 )
 = 0.5 – 0.4960
 =0.0040 .
REFERENCES
:
1.Statistics For The Social Sciences with Computer Applications –
Anthony Walsh
2.Schaum’s Outline of Theory and Problems of STATISTICS – Murray R. Spiegel
3.Business Statistics – SP Gupta & MP Gupta
4.Descriptive and Inferential Statistics – An introduction - Herman J
Loether & Donald G McTavish

You might also like