1.sampling Methods and Sample Size Determination
1.sampling Methods and Sample Size Determination
1.sampling Methods and Sample Size Determination
By Gerbaba Guta
2020
1
Sampling methods and sample size determination
Sample
• In research terms a sample is a group of people,
objects, or items that are taken from a larger
population for measurement.
• The sample should be representative of the
population to ensure that we can generalize the
findings from the research sample to the population
as a whole
2
Cont…
Sampling
• procedure by which some members of a given
population are selected as representatives of
the entire population
Why sampling than census?
• Greater speed
• More accuracy
• Resource allocation (monetary, human power
or time)
3
…cont
• Studying the whole population is impossible when
population contains infinitely many members
• It is the only choice when a test involves the
destruction of the item
Definition of sampling terms
Sampling unit (element)
• A subject under observation on which information is
collected
Example: children under 5 years, hospital discharges,
and health events
4
…cont
Sampling fraction
• A ratio between sample size and population size
Example: 300 out of 1500 individuals (20%)
Sampling frame
• A list of all the sampling units from which a sample is
drawn
Example: Lists of all children under 5 years
Sampling scheme
Method of selecting sampling units from sampling
frame. It can be done either by probability or non-
probability sampling method
5
Sources of error in sampling
Types of errors
Non- sampling errors (bias)
• Not random in nature
• occur both in census and sample survey
• It involves problems of sample design such as:
Choice of sampling frame
Choice of sampling units
• Technically faulty done on observations during data
recording or during processing of data
6
Common types of non-sampling errors
Measurement errors
• Obtaining inaccurate answers to survey questions
Example:
Interviewer error
• It includes:
Recording error (when the interviewer fail to record
the correct response of the participants )
Interviewers may distort an interview (make the
judgment that they already know what the
respondent would say to a question based on their
prior responses)
Questions may not be clearly stated 7
…cont
Response error
The tendency of respondents to give socially
acceptable answers
Respondents do not possess the correct information.
Respondents deliberately lie.
Respondents may twist the responses so that it
makes them look better
Instructions are vague or not clear ( more serious
when we use questionnaire to collect data)
8
…cont
Processing error
Using wrong values of measurements for
analysis
Transcription error
Selection error
An error that occurs during the selection of
sample units
Measurement errors can be controlled by
using suitable, reliable, and valid instruments
9
Sampling errors
• Random variations in the sample estimates around
the true population parameters or
• The difference between the estimate of a value
obtained from a sample and the actual value of the
population (parameter)
• Random in nature
• Cannot be avoided
• Can be minimized by increasing sample size
• Takes smaller magnitude in homogeneous
population
10
Factors that increase the magnitude of sampling error
• Non-representative sample
• Small sample size (below optimum size)
The larger the sample size, the smaller the
sampling error.
However, too large sample is costly and no
more advantage than optimum sample size
• Heterogeneity of population
11
Random error is unavoidable
• Different samples drawn from the same population
can have different properties
• Sample is only a portion of the population we are
trying to understand
12
Sampling techniques (methods)
• Techniques or procedure how to take a sample from
the population
• If the entire population is sufficiently small, census
is appropriate
• If the population is too large, sample survey is
appropriate
• Sampling methods are classified into probability and
non-probability
13
Probability (random) sampling techniques
• Each member of the population has a non-zero
chance of being selected
• Representativeness and generalize-ability will be
achieved (standard statistical tests were developed
for them)
14
The Sampling Design Process
i. Define the Target Population
15
Types of random sampling methods
Types of random sampling methods
16
…cont
• To use this method:
the population should be homogeneous (similar)
with regard to the characteristics under
consideration
Sampling frame should be known/available
17
Procedures to select the sample
• How do we actually take a random sample?
• The specific procedures that you follow may vary depending
on your resources, but all involve some type of random
process.
• Depending on the complexity of the population, we can use
different tools to select n samples from the frame.
These are lottery method,
table of random number (they are available in the
appendix of many research methods and statistics
textbooks) or
Computer generated random number.
18
…cont
Advantage
• Base for comparing the precision of different
methods of sampling and teaching general
probabilistic sampling rules
Disadvantage
• In large population and wide geographical sampling
areas it is not easy to take a list form of all units and
randomly selecting them
19
2. Systematic random Sampling
• Units of the population are arranged in some order (
e.g. ID NO or alphabetical order)
• Starting from a random point on a sampling
frame, every nth element in the frame is
selected at equal intervals (sampling interval).
• Every kth unit will be selected after the first unit is
selected randomly between 1 and k
• Sampling frame is required to use this method
• In the order of sampling frame, there should not be
some pattern
th th th st
• The last unit can be computed by: n N k 1
20
Example
Suppose a population consists of 1000 units and we
wish to take a sample of 200 units by systematic
random sampling:
N 1000
• k 5
n 200
(1000 5 3)
th
th
998
22
…cont
Advantage
• Very easy
• Less time consuming
Disadvantage
• The chance of selecting a non-representative sample
is very high (when there is a correlation between the
place of the unit in the population list with respect to
the characteristics of the unit that should be
observed)
23
3. Stratified random sampling
• When individual members of a population are different
from each other, the population is considered to be
heterogeneous (having significant variation among
individuals).
• The population can be divided in to sub population called
strata
• The strata should be non-overlapping
• The strata are internally homogenous but heterogeneous
externally
• Strata are usually formed based on:
Age , Sex, Income level, Occupation , Educational status ,
Marital status, Culture and etc
24
…cont
• Sample will be taken from each sub-group (stratum)
by:
A simple random or a systematic sample is taken
from each stratum relative to the proportion of that
stratum to each of the others
Example: To estimate the prevalence of STI among
female sex workers in a city, we can have two strata
of FSWs: street based and hotel based
• The strata differ from each other regarding:
Percentage of high risk behavior and
Medical consultant and available health care services
25
…cont
• A desired sample should be selected from each
group/strata to get a precise estimate of STI
prevalence of the target population
FSWs FSWs
Street-based Hotel-based
26
…cont
Advantages
• When we want to achieve certain information for specific
subdivision of the population
• Helpful in studies in multiple administrative areas (each area
as a stratum )
• Dividing population into subdivisions will enable us to define
specific methods and criteria for work in each division
• The overall precision of the estimates will be more exact
Disadvantage
• The assumption of little variation and similarity within strata’s
in real world is not easily achievable
27
…cont
28
Cont…
29
…cont
• Cluster sampling can be done in:
One step
• Few clusters will be selected and the units in the
selected clusters will be taken
Multi stage
• Some units within clusters will be choosen randomly
Example: In assessing the satisfaction of HIV positive
patients from hospital based health care services in
the capital city, assume that there are about 200
hospitals in Addis
30
…cont
• Suppose our sample size is 20
• In one step method, 20 hospitals will be selected
and all patients form the selected hospitals should be
included in the study
• In multistage cluster sampling, first 20 hospitals will
be selected and second patients within selected
hospitals (cluster) will be selected randomly
31
…cont
Advantage
• Reduce cost in sampling from wide geographical
area’s by defining neighboring regions as a cluster
Disadvantage
• Its precision is lower compared to stratified sampling
and it needs bigger sample sizes to bring same
precision
32
Selecting a sampling method
• Population to be studied
– Size/geographical distribution
– Heterogeneity with respect to variable
• Availability of list of sampling units
• Level of precision required
• Resources available
33
Non-probability (Non-random) sampling techniques
• Generally used in research area where computation
of sampling error can be overlooked
• Used only in preliminary research or
• Used only in studies where error rates are not
considered important
• Probability theories and concepts are not used in the
method
• Members are selected from the population in some
non-random manner
34
When to use non-probability sampling?
• To demonstrate that a particular trait exists in the
population
• To do a qualitative, pilot or exploratory study
• when randomization is impossible (when the
population is almost infinite)
• When the aim is not to generate results that will be
used to create generalizations pertaining to the
entire population
• If we have limited budget, time and workforce
• For initial study which will be carried out again using
a randomized, probability sampling
35
Types of non-random sampling
1. Convenience/accidental/haphazard sampling
• Relies upon convenience and access
• Obtain a sample of convenient elements
• Respondents are selected because they happen to be
in the right place at the right time (i.e. they are easily
available)
Example :
• Patients with specific cancer diagnosis attending a
clinic.
• Interview only people on the main street
36
2. Judgment (purposive) Sampling
37
Examples:
• A researcher may decide to draw the entire
sample from one "representative" city, even
though the population includes all cities
• Samples based on the clinical condition of
patients (e.g. select all hypertensives)
3. Quota sampling
• Non-probability equivalent to stratified sampling
• First the population are divided into strata
38
…cont
• The bases of the quota are usually:
Age, gender, education, race, religion and
socioeconomic status
Example
Taking college year level as a base for a quota requiring
equal representation from each level, we can take a
sample size of 100, by selecting 25 1st year students,
another 25 2nd year students, 25 3rd year and 25
4th year students
39
4. Snowball sampling
• Used when there is a very small population size
• Initial subject is used to identify another potential
multiple subjects who also meets the eligibility criteria
of the research
• Useful when we want to reach populations that are
inaccessible or hard to find (popn. With no address or
no sampling frame)
Examples:
• Sampling heroin addicts
An addict may be asked for the names of other addicts that
he or she knows
• Studying the homeless people
Identify one or two and ask other homeless in their area 40
Advantages and Disadvantages of Probability and
Non-probability sampling methods
Sampling methods Advantages Disadvantages
41
Summary
• Probability sampling methods are the best
Ensure
– Representativeness
– Precision
• …..within available constraints
• Non- Probability sampling methods could be
used for exploratory/ preliminary research
42
Exercise
43
Sample Size Determination
• Taking a large sample than is needed to
achieve the desired results is wasteful
resources
• Very small samples often lead to results that
push us to give wrong conclusion
• Thus, Optimum/adequate sample size is
recommendable
44
Rules of thumb for determining the sample size
1. The larger the population size, the smaller the percentage of the
population required to get a representative sample
2. For smaller samples (N ‹ 100), there is little point in sampling.
Survey the entire population.
3. If the population size is around 500 (give or take 100), 50%
should be sampled.
4. If the population size is around 1500, 20% should be sampled.
5. Beyond a certain point (N = 5000), the population size is almost
irrelevant and a sample size of 400 may be adequate.
45
Methods of sample size determination
1. Precision based sample size determination
• If our aim is to estimate unknown population parameter
(population mean or proportion),
our sample size determination is related to estimation.
The method is said to be precision based sample size
determination.
46
Precision based sample size determination
n
nf , FPC is considered
n 1
1
N
47
Example
A health officer wishes to estimate the mean
haemoglobin level in a defined community.
Preliminary information shows that the mean is
about 150 mg/l with a standard deviation of 32mg/1.
If a sampling error of up to 5mg/l in the estimate is
to be tolerated at 95% confidence level, how many
subjects should be included in the study?
Solution:
Here, s= 32mg/l, and E=5 mg/1
0 . 05 , z
2
z 0 . 025
1 . 96 , E 5
48
…cont
If the population is assumed to be very large, the
required minimum sample size would be:
n
z / 2 2 1 .96 32
2
157 .4 158
2
2 2
E 5
If the community to be sampled has 1000 people, the
required minimum sample size would be:
n 158 158
n f
n 1
157 1.157
136.6 137
1 1
N 1000
49
Sample size determination for qualitative/categorical response
variable: Estimation single population proportion( )
50
…cont
a) What sample size is required if a previous survey
shows that 15% of adults were allergic to trees,
weeds, flowers, and grasses?
b) What would the sample size be, for the same
degree of confidence and same maximum allowable
error, if no such previous survey had been taken?
Solution:
a) p=0.15, q=0.85 and E=3%=0.03.
0 . 01 , z z 2 . 58
2
0 . 005
n
2
z /2
* p*q
2 .58 * 0 .15 * 0 .85 0 .8487
2
943
2 2
E 0 . 03 0 . 0009
n z
2
/2
* p*q
2 . 58 2
* 0 . 5 * 0 . 5 1 . 6641
1849
2 2
E 0 . 03 0 . 0009
52
2. Power based sample size determination
• Analytical study design
• The primary purpose of an analytical study is to test
(one or more) null hypotheses
• Determination of the sample size requires the
specification of:
Significance level (probability of committing type I
error-rejecting true null hypothesis, α )
Power of the test, (probability of rejecting false null
hypothesis-correct decision, 1-β)
Margin of error and
Probability distribution of the estimator.
53
Cont…
• If our aim is to test a hypothesis about unknown
population parameter,
our sample size determination is related to
hypothesis testing.
• The method is said to be power based sample
size determination.
• The minimum statistical power required for
hypothesis test is 80%.
54
1. Sample size for Testing equality of two means
(quantitative response variable)
• The hypotheses to be tested are:
H 0: 1 2 0 vs H1: 1 2 0
• If the two groups have equal variance the sample size
per group is given:
n
2 2
2* Z 2 Z *
1 22
55
Cont…
• If the two groups have different variances the sample
size per group is given:
Z 2 Z 2 12 22
n
1 2 2
56
Example
The trial was designed to assess the effectiveness of a new
therapy treatment on the treatment of severe sepsis and
septic shock. The clinicians measure the effectiveness of
the therapies of the treatments using mean arterial
pressures and wish to detect a difference of at least
14mmHg between the two groups. Assuming the standard
deviation of the two groups is 20mmHg, in order to detect
a difference of this magnitude that is significant at 95%
confidence level and a power of 80%, how many patients
are required in the treatment (new therapy) and control
(standard therapy) groups?
57
cont…
The study will require 32 patients in each group:
2* Z 2 Z 2* 2 2 20 21.96 0.84 2 6272
n 32
1 2 2 14 2 196
H 0 : 1 2 0 vs H1 : 1 2 0
58
Cont…
• If the two groups have equal variances the sample size
per group is given:
2
2* Z 2 Z * p (1 p )
n
p1 p2 2
59
Cont…
• If the two groups have equal variances the
sample size per group is given:
60
Example
• Consider a study investigating the effectiveness
of aspirin in reducing the mortality rate due to
myocardial infarction (heart attacks).
• Let denote the proportion of deaths for
A
61
Cont…
Previous studies indicate that the proportion of
deaths due to heart attacks is 0.015 for nonusers
and 0.001 for users. Investigators wish to determine
the sample size required to detect an absolute
difference of |0.001 − 0.015| = 0.014 with 80%
power using a two-sided 5% level of significance
test.
In order to detect a difference of this magnitude,
calculate the sample size required.
62
Cont…
• Assuming different variances in the two populations:
n
1.96 20.008(10.008) 0.84 0.001(10.001)0.015(10.015) 0.124206
2
633.7634
0.0010.0152 0.000196
63
Cont…
• Assuming the same variance in the two populations:
2* Z 2 Z 2* p (1 p )
n
p1 p 2 2
2*1.960.842*0.008(10.008) 0.124436
n 2 634.88 635
0.0010.015 0.000196
Equivalently:
H 0 : OR 1 vs H1 : OR 1
65
Cont…
• The sample size required for this study was determined
by using Kelsey formula as follows:
• Sample size for cases group:
2
p 1- p z z
r 1 2
n1
2
r
p1 p2
• Sample size for control group:
n 2
r * n1
66
Cont…
• Where:
r= ratio of controls to cases (r=1 for equal sample size for controls
and cases)
p
p1
p
2
is the average proportion of exposed to the risk factor
2
under question for the entire pooled population.
p1 is proportion of cases exposed to risk factor under question
p is proportion of controls exposed to risk factor under question
2
67
Example
A researcher wants to see the effect of childhood sexual
abuse (risk factor under question) on psychiatric disorder
in adulthood. The researcher will retrospectively assess
childhood sexual abuse in cases (adult person with
psychiatric disorders) and controls (adult person without
psychiatric disorders) to compare the proportion of cases
exposed to childhood sexual abuse with the proportion of
controls exposed to childhood sexual abuse. Suppose that
35% of the cases and 20% of the controls were exposed to
childhood sexual abuse. At 5% significant level, 80% power
of the test, calculate sample size for this study assuming
equal sample size allocation to case and control groups. 68
Cont…
• Sample size for case group:
2 p1 p 2 0.35 0.2
r 1 p 1- p z 2 z p 0.275
n1 2 2
2
r
p1 p2
2
n1
11
1
0.275 1- 0.275
0.350.2
1.96
2
0.84
138.9139
70
The other version of the above example is given as follows:
A researcher wants to see the effect of childhood sexual
abuse (risk factor under question) on psychiatric disorder
in adulthood. The researcher will retrospectively assess
childhood sexual abuse in cases (adult person with
psychiatric disorders) and controls (adult person without
psychiatric disorders) to compare the proportion of cases
exposed to childhood sexual abuse with the proportion of
controls exposed to childhood sexual abuse.
Suppose that 20% of the controls were exposed to childhood
sexual abuse and we want to detect an odds ratio (OR) of 2.0
or greater at 5% significant level, 80% power of the test,
calculate sample size this study assuming equal sample size
allocation to case and control groups. 71
Cont…
• The proportion of cases exposed to childhood sexual
abuse can be estimated as follows:
0 .2 2 0 .4
p1 0 . 33
1 0 .2 2 1 1 .2
r 1
p1 - p z
2
z
2
p
p1 p2 0.330.2
0.265
n1 r
p1 p2 2
2 2
2
0.265 1- 0.265 1.96 0.84
11
n1 180.7 181
2
1 0.33 0.2
72
Sample size for Testing equality of proportion of
events among exposed group and unexposed group
: Cohort study design
• Our interested is to test the hypothesis:
H 0 : RR 1 vs H1 : RR 1
• Information required :
Power (1-β): probability of detecting a real effect (a true
relative risk or experimental event rate). Common
values are 80% or 90%.
Significance level (α): Fixing probability of type I error/
rejecting true null hypothesis (Common values are 5% or
1%).
73
Cont…
P0: probability of event (e.g. death/disease) in non-
exposed/controls.
Can be estimated by prevalence of population under
study.
P1: probability of event (e.g. death/disease) in
exposed/experimental subjects.
RR =P1/P0: relative risk of events between
exposed/experimental subjects and non-exposed
/controls if P1 is not known.
74
Cont…
• In this case, we will determine a relative risk (RR) that we
want to detect with the statistical power and then
calculate P1 as follows:
p1
RR p 1 p 0 RR
p0
75
Cont…
• The sample size for exposed group (n1) is given
by:
n1 2
1
Z (1 m) p(1pZ
p0(1p0)
m
p1(1p1)
2
2
p0p1
p1 m p 0 p0 p1
p p , if m 1(equal sample size)
m 1 2
n 2 m*n1
76
Example
Suppose that a researcher wants to see the impact of
weight training exercise on cardiovascular mortality.
According to the previous studies, the proportion of
cardiovascular death in people who participate in weight
training exercise (exposed) is 20% and in people who did
not participate in weight training exercise (unexposed) is
40%.
At 5% significance level, 80% statistical power with
equal number of exposed and non-exposed, calculate
sample size for this study.
77
Cont…
• Sample size for exposed group (n1):
n1
1.96 (1 1 ) 0.3 (10.3 ) 0.84 0.4 (10.4 ) 0.2 (10.2 )
1 1 81.1382
2
p
0.4 0.2
2 0.3
2
0.40.2
79
Exercise
• Determine sufficient sample size for your
proposed topic of research?
80