Gate Scholorship Work - October: Sampling Fundamentals
Gate Scholorship Work - October: Sampling Fundamentals
Gate Scholorship Work - October: Sampling Fundamentals
- BY
Siva sankari s
Sampling Fundamentals
1 Sampling fundamentals
Some fundaments definitions
1.Universe/Population(finite or infinite) For example, a research may prepare a list of the all the
households of a locality which have pregnant women or
A population can be defined as including all people or may used a register of pregnant women for antenatal
items with the characteristic one wishes to understand. care available with the local anganwari worker.
Because there is very rarely enough time or money to
gather information from everyone or everything in a 3. Sampling design
population, the goal becomes finding a representative It refers to the procedure the researcher would follow or
sample (or subset) of that population. adopt in selecting same sampling units from which
Note also that the population from which the sample is inferences about population is drawn.
drawn may not be the same as the population about A sample design is made up of two elements.
which we actually want information. Often there is large Sampling method. Sampling method refers to the rules
but not complete overlap between these two groups due and procedures by which some elements of the
to frame issues etc . population are included in the sample. Some common
Sometimes they may be entirely separate - for instance, sampling methods aresimple random sampling ,
we might study rats in order to get a better stratified sampling , and cluster sampling .
understanding of human health, or we might study
records from people born in 2008 in order to make 4. Statistic/parameter
predictions about people born in 2009. A statistic is a numerical value based upon sample,
whereas parameter is the numerical value based upon
2. Sampling frame: population.
When we calculate mean from a sample this is called a
It is a list of all the elements or subjects in the population statistic because it describes the characteristics of a
from which the sample is drawn. sample.
Sampling frame could be prepared by the researcher or When the same mean is calculated from a population it
an existing frame may be used. is called parameter because it describes characteristics
of a population.
2 Sampling fundamentals
5. Sampling error: 6. Precision:
The sampling error is a number that describes the Precision is the range within which the population average
precision of an estimate from any one of those samples. (or other parameter) will lie in accordance with the
It is usually expressed as a margin of error associated reliability specified in the confidence level as a percentage
with a statistical level of confidence. For example, a of the estimate ± or as a numerical quantity.
presidential preference poll may report that the
incumbent is favored by 51% of the voters, with a margin 7. Confidence interval and significance level:
of error of plusor- minus 3 points at a confidence level of
95%. Expected percentage of times that the actual value will fall
This means that if the same survey were conducted with within the stated precision limits.
100 different samples of voters, 95 of them would be
expected to show the incumbent favored by between Precision is the range within which the answer may vary
48% and 54% of the voters (51% ± 3%). and still be acceptable; confidence level indicates the
likelihood that the answer will fall within that range, and
the significance level indicates the likelihood that the
answer will fall outside that range
8. Sampling distribution
3 Sampling fundamentals
Concept of a sampling distribution is perhaps the most basic
concept in inferential statistics.
4 Sampling fundamentals
2. The sampling distribution of proportion/binomial 3. F DISTRIBUTION:
Two independent normal populations, having the same
Usually the statistics of attributes correspond to the variance
conditions of a binomial distribution that tends to become • The calculated value of F from the sample data is compared
normal distribution as n becomes larger and larger. If p with the corresponding table value of F and if the former is
represents the proportion of defectives i.e., of successes and equal to or exceeds the latter, then we infer that the null
q the proportion of non defectives i.e., of failures (or q = 1 – hypothesis of the variances being equal cannot be accepted.
p) and if p is treated as a random variable, then the sampling We shall make use of the F ratio in the context of hypothesis
distribution of proportion of successes has a mean = p with testing and also in the context of ANOVA technique.
standard deviation = p × q n, where n is the sample size.
4. chi-square distribution:
3. Students t-distribution
• Distribution is not symmetrical and all the values are
The variable t differs from z in the sense that we use positive with (n – 1) degrees of freedom.
sample standard deviation in the calculation of t, whereas • Chi-square distribution is encountered when we deal with
we use standard collections of values that involve adding up squares.
deviation of population in the calculation of z.
There is a different t distribution for every possible sample
size i.e., for different degrees of freedom.
The degrees of freedom for a sample of size n is n – 1. As
the sample size gets larger, the shape of the t distribution Point estimator:
becomes approximately equal to the normal distribution A point estimator is a formula that uses sample data to
calculate a single number (a sample statistic) that can be
used as an estimate of a population parameter. e.g. ¯x, s
to calculate μ, σ,
5 Sampling fundamentals
Central Limit Theorem: SAMPLING THEORY
The central limit theorem states that:
Given a population with a finite mean μ and a finite non zero (i) Statistical estimation:
variance σ2, the sampling distribution of the mean Sampling theory helps in estimating unknown population
approaches a normal distribution with a mean of μ and a parameters from a knowledge of statistical measures based
variance of σ2/N as N, the sample size, increases on sample studies.
In other words, to obtain an estimate of parameter from
Example of the Central Limit Theorem in Practice: statistic is the main objective of the sampling theory.
Roll 30 dice and calculate the average (sample mean) of the The estimate can either be a point estimate or it may be an
numbers that you get on each die. interval estimate. Point estimate is a single estimate
Now repeat this experiment 1000 times each time rolling 30 expressed in the form of a single figure, but interval
dice and computing a new sample mean. estimate has two limits viz., the upper limit and the lower
Plot a histogram of the 1000 sample means that you have limit within which the parameter value may lie.
obtained. Interval estimates are often used in statistical induction.
This plot will look approximately normal
(ii) Testing of hypotheses:
The second objective of sampling theory is to enable us to
decide whether to accept or reject hypothesis; the
sampling theory helps in determining whether observed
differences are actually due to chance or whether they are
really significant.
6 Sampling fundamentals
i) To test the significance of the mean of a random sample
Researchers can use A-test when correlated samples are
employed and hypothesised mean difference is taken as zero
i.e., H0 D : m = 0 .
Psychologists generally use this test in case of two groups
that are matched with respect to some extraneous
While using A-test, we work out A-statistic that yields exactly
the same results as Student’s t-test*.
A-statistic is found as follows:
7 Sampling fundamentals
CONCEPT OF STANDARD ERROR E.g. if different samples of the same size n are drawn from
a population, we get different values of sample mean x .
The standard deviationStandard error statistics are a class of The S.D. of x . is called standard error of x . . It is obvious
inferential statistics that function somewhat like descriptive that the standard error of x . will depend upon the size of
ztatistics in that they permit the researcher to construct confidence the sample and the variability of the population.
intervals about the obtained sample statistic. The confidence
interval so constructed provides an estimate of the interval in which
the population parameter will fall. The two most commonly used
standard error statistics are the standard error of the mean and the
standard error of the estimate.
9 Sampling fundamentals
Nature of universe
Number of classes proposed
Nature of study
Type of sampling
Standard of accuracy and acceptable confidence level
Availability of finance
Other considerations
10 Sampling fundamentals
Whatever may be the degree of cautiousness in
selecting sample, there will always be a difference There are various approaches for computing the sample
between the parameter and its corresponding estimate. size [5, 57, 117]. To determine the appropriate sample
A sample with the smallest sampling error will always be size, the basic factors to be considered are the level of
considered a good representative of the population. precision required by users, the confidence level desired
Bigger samples have lesser sampling errors. When the and degree of variability.
sample survey becomes the census survey, the sampling
error becomes zero. i) Level of Precision :
On the other hand, smaller samples may be easier to
manage and have less non-sampling error. Sample size is to be determined according to some pre
Handling of bigger samples is more expensive than assigned ‘degree of precision’.
smaller ones. The non-sampling error increases with the The ‘degree of precision’ is the margin of permissible
increase in sample size error between the estimated value and the population
In other words, it is the measure of how close an
estimate is to the actual characteristic in the
The level of precision may be termed as sampling error.
According to W.G.Cochran (1977), precision desired
may be made by giving the amount of errors that are
willing to tolerate in the sample estimates.
The difference between the sample statistic and the
related population parameter is called the sampling
error. It depends on the amount of risk a researcher is
willing to accept while using the data to make
It is often expressed in percentage.
If the sampling error or margin of error is ±5%, and 70%
unit in the sample attribute some criteria, then it can
be concluded that 65% to 75% of units in the
population have attributed that criteria.
11 Sampling fundamentals
High level of precision requires larger sample sizes and DETERMINATION OF SAMPLE SIZE THROUGH THE APPROACH
higher cost to achieve those samples. BASED ON BAYESIAN STATISTICS
ii) Confidence level desired : This approach of determining 'n'as such is known as Bayesian
The confidence or risk level is ascertained through the well The procedure for finding the optimal value of 'n' or the
established probability model called the normal sample under this
distribution and an associated theorem called the Central
(i) Find the expected value of the sample information
Limit theorem.
The probability density function (p. d. f) of the normal (EVSI)* for every possible n;
distribution with parameters μ and s is given by (ii) Also workout reasonably approximated cost of taking a
sample of every possible n;
(iii) Compare the EVSI and the cost of the sample for every
possible n. In other words, workout the expected net gain
(ENG) for every possible n as stated below: For a given
sample size (n): (EVSI) – (Cost of sample) = (ENG)
(iv) Form (iii) above the optimal sample size, that value of n
which maximises the difference between the EVSI and the
cost of the sample, can be determined.
The computation of EVSI for every possible n and then
comparing the same with the respective cost is often a very
cumbersome task and is generally feasible with mechanised or
computer help.
Hence, this approach although being theoretically optimal is
rarely used in practice.
12 Sampling fundamentals