Sample Size Calculation
Sample Size Calculation
Sample Size Calculation
ABSTRACT
Calculation of exact sample size is an important part of research design. It is very important to understand that different
study design need different method of sample size calculation and one formula cannot be used in all designs. In this
short review we tried to educate researcher regarding various method of sample size calculation available for different
study designs. In this review sample size calculation for most frequently used study designs are mentioned. For genetic
and microbiological studies readers are requested to read other sources.
Department of Pharmacology, Govt. Medical College, Surat, Gujarat, 1Independent Researcher, Kolkata, West Bengal, India
Indian Journal of Psychological Medicine | Apr - Jun 2013 | Vol 35 | Issue 2 121
Charan and Biswas: Sample size calculations for different study designs
Z 2
1 − p) is 25 mmHg then formula for sample size calculation
1− α /2 p (
Sample size = 2 will be
d
2
1.96 (25)2
Here =
Sample size = 2
96
5
Z1‑a/2 = Is standard normal variate (at 5% type 1
error (P<0.05) it is 1.96 and at 1% type 1 error (P<0.01) So researcher will have to take the blood pressure of at
it is 2.58). As in majority of studies P values are least 96 children to know average systolic blood pressure
considered significant below 0.05 hence 1.96 is used in paediatric age group.
in formula.
p = Expected proportion in population based on Sample size calculation for case control studies
In case control studies cases (the group with disease/
previous studies or pilot studies.
condition under consideration) are compared with
d = Absolute error or precision – Has to be decided
controls (the group without disease/condition under
by researcher.
consideration) regarding exposure to the risk factor
under question.
For example, let us assume that a researcher wants to
estimate proportion of patients having hypertension in
The formula for sample size calculation for this design
paediatric age group in a city. According to previously
also depends on the type of variable (qualitative or
published studies actual number of hypertensives
quantitative). Here formula for independent case
may not be more than 15%. The researcher wants to
control study is mentioned. To read these formulae in
calculate this sample size with the precision/absolute
more detail other texts should be referred.[7,8]
error of 5% and at type 1 error of 5%. So if we use the
above formula For qualitative variable
Suppose a researcher want to see the link between
1.962 × 0.15
(1 − 0.15) childhood sexual abuses with psychiatric disorder in
Sample size = 2
= 196
0.05 adulthood. He will take a sample of adult persons
with psychiatric disorder and will take another sample
So for this cross sectional study researcher has to take of normal adults having no psychiatric disorders. He
at least 196 subjects. If the researcher want to increase will then go retrospectively to see history of childhood
the error (decrease the precision) then denominator will sexual abuse in both groups. Exposure to both groups
increase and hence sample size will decrease. will be compared and odds ratio will be calculated. Here
number of people exposed to childhood sexual abuse
For quantitative variable is qualitative variable hence this formula will be used
Suppose the same researcher is interested in knowing for such type of design
average systolic blood pressure of children of the same
r + 1(p*) (1 −
p*)(Z 2
city then below mentioned formula should be used as β +
Zα / 2)
blood pressure is a quantitative variable. Sample size = 2
r 1 −
(p p2 )
Z 1− α / 22 SD2
Sample size = r = Ratio of control to cases, 1 for equal number of
d2 case and control
p* = Average proportion exposed = proportion of
Z1‑a/2 = Is standard normal variate as mentioned in exposed cases + proportion of control exposed/2
previous section. Zb = Standard normal variate for power = for 80%
SD = Standard deviation of variable. Value of standard power it is 0.84 and for 90% value is 1.28. Researcher
deviation can be taken from previously done study or has to select power for the study.
through pilot study. Za/2 = Standard normal variate for level of significance
d = Absolute error or precision as mentioned in as mentioned in previous section.
previous section p1 – p2 = Effect size or different in proportion expected
based on previous studies. p1 is proportion in cases and
So if the researcher is interested in knowing the average p2 is proportion in control.
systolic blood pressure in pediatric age group of that
city at 5% of type of 1 error and precision of 5 mmHg So if the researcher wants to calculate sample size
of either side (more or less than mean systolic BP) and for the above‑mentioned case control study to know
standard deviation, based on previously done studies, link between childhood sexual abuse with psychiatric
122 Indian Journal of Psychological Medicine | Apr - Jun 2013 | Vol 35 | Issue 2
Charan and Biswas: Sample size calculations for different study designs
disorder in adulthood and he wants to fix power of time period to see the event rate in both groups. If a
study at 80% and assuming expected proportions in case researcher wants to see the impact of weight training
group and control group are 0.35 and0.20 respectively, exercise on cardiovascular mortality then he will select
and he wants to have equal number cases and control; two groups, one consisting of subjects who do exercise
then the sample size per group will be and another consisting of those who don’t do. These
groups will be followed up for a specific time period
2 (p*) (1 − p*)(0.84 + 1
.96)2 to see cardiovascular mortality in both groups. At the
Sample size = end of the study period both groups will be compared
1 (0.35 − 0.20)2
for cardiovascular mortality. The formula for sample
size is
p* = Average proportion exposed = (proportion of
exposed cases + proportion of control exposed)/2
2
1
= (0.35 + 0.20)/2 = 0.275 Z
α 1 + p * (1 − p *) + Z
β p1
m
So sample size =
2 (0.275)(1 0.275)(0.84 96
1 2
(1 − p1) / m + p2 (1 − p2)
Sample size =
1 (0.35 − 0.20)2 ( p1 − p2 )
2
= 138.9
Za = Standard normal variate for level of significance
So the researcher has to recruit at least 139 subjects in m = Number of control subject per experimental subject
cases and equal number in control as he wants to have Zb = Standard normal variate for power or type 2 error
equal number in both. as explained in earlier section
p1 = Probability of events in control group
For quantitative variable p2 = Probability of events in experimental group p
Suppose a researcher wants to see the association
between birth weight and diabetes in adulthood. The 2 + m
p p1
birth weight being a quantitative data, the researcher will P* =
m+1
select one group i.e. cases that will be diabetic adults and
other group i.e. control will be non‑diabetic adults. Both
So suppose the researcher wants to see the impact of
groups will be traced back for data regarding childhood
weight training exercise on cardiovascular mortality
weight. The formula for sample size calculation is
and according to previous studies proportion of
2
cardiovascular death in case may be around 20% and
r + 1 SD (Z Zα / 2 )2
β + in control it can be around 40% hence sample size
Sample size =
r d2 calculation for 5% of significant level and 80% power
with equal number of case and control will be
SD = Standard deviation = researcher can take value
2
from previously published studies 1
d = Expected mean difference between case and 1.96 1 + 0.30
(1 − .
0 30)
+ .
0 84 0.40
1
control (may be based on previously published studies.)
r, Zb, Za/2 are already explained in previous sections.
(1 − 0.40) / 1 + .
0 20 (1 − 0.20)
= 59.41
(0.40 − 0.20)2
So if researcher think that difference in mean weight
between case and control may be around 250 gm and
So, the researcher has to take 59 samples in each group.
SD is 1 Kg then considering equal number of cases and
control and 80% power the sample size will be
It is worthy of mention here that these formulas for
2 1 (0.84 + 1
.96) case control and cohort study are for independent
Sample size = = 250.88 design studies. They are not for matched case control
1 0.252
and cohort studies. These formulae can be modified
or corrected depending on population size or ratio
Hence researcher has to take 251 subjects in each between sample size and population size. Detailed text
group (case and control). should be read to know more about technical aspects
of sample size calculation.[7,8] Readers are advised to
Sample size calculation of cohort studies use various freely available epidemiological calculators
In cohort studies healthy subjects with or without like openEpi given in appendix to calculate sample
exposure to some risk factor are observed over a size formula.
Indian Journal of Psychological Medicine | Apr - Jun 2013 | Vol 35 | Issue 2 123
Charan and Biswas: Sample size calculations for different study designs
Sample size calculation for testing a hypothesis will be two tailed unpaired t test. The effect size in
(Clinical trials or clinical interventional studies) this condition is 10 mmHg. Hence sample size will be
In this kind of research design researcher wants to see
the effect of an intervention. Suppose a researcher 2(25)2 (1.96 +
0.84)2
want to see the effect of an antihypertensive drug Sample size = = 98
102
so he will select two groups, one group will be given
antihypertensive drug and another group will be give
So in this case the researcher needs 98 subjects per
placebo. After giving these drug s for a fixed time period
group.
blood pressure of both groups will be measured and
mean blood pressure of both groups will be compared Formula for sample size calculation for comparison
to see if difference is significant or not. Complex between two groups when endpoint is qualitative
formulae are used for this type of studies and we When the endpoint of a clinical intervention study is
want to advise readers to use statistical software for qualitative like alive/dead, diseased/non diseased, male/
calculation of exact sample size. The procedure for female etc., then the following formula can be used for
calculation of samle size in clinical trials/intervention sample size calculation for comparison between two
studies involving two groups is mentioned here.In groups. Suppose the researcher is interested in knowing
the case of only two groups method of calculation is protective effect of a drug on mortality in patients
mentioned here but if design involves more than two of myocardial infarction. He selected two groups of
groups then statistical software like G Power should be patients of myocardial infarction one group was given
used for sample size calculation. But understanding of that drug and another group was given placebo. The
various prerequisites which are needed for sample size both groups were kept under observation and at the
calculation is very important. end of study death in both groups were compared.
For sample size of this type of study below mentioned
Formula for sample size calculation for comparison
formula can be used.
between two groups when endpoint is quantitative data
When the variable is quantitative data like blood
pressure, weight, height, etc., then the followingformula Z β)2
2(Zα / 2 + P(1 − P)
Sample size =
can be used for calculation of sample size for comparison (p1 − p2 )2
between two groups.
Za/2 = Z0.05/2 = Z0.025 = 1.96 (From Z table) at type 1
2SD2 (Za / 2 +
Zβ )2 error of 5%
Sample size =
d2 Zb = Z0.20 = 0.842 (From Z table) at 80% power
p1−p2 = Difference in proportion of events in two
SD – Standard deviation = From previous studies or groups
pilot study P = Pooled prevalence = [prevalence in case group (p1)
Za/2 = Z0.05/2 = Z0.025 = 1.96 (From Z table) at type 1 + prevalence in control group (p2)]/2
error of 5%
Zb = Z0.20 = 0.842 (From Z table) at 80% power In above example, let us assume that previous study says
d = effect size = difference between mean values that 20% of patient of myocardial infarction die within a
specified time. The researcher feels that if the drug being
So now formula will be tested increases survival to 30% then the finding can
be considered as clinically significant. Effect size will be
2SD2 (1.96 +
0.84)2 difference between proportions. 0.2 – 0.3= –0.1. At 5%
Sample size = of significance level and 80% power sample size will be
d2
Pooled prevalence = (0.20 + 0.30)/2 = 0.25
For example, suppose a researcher wants to see the
effect of a potential antihypertensive drug and He wants 0.84)2
2(1.96 0.25
(1 0 25
to compare the new drug with placebo. Researcher Sample size = 2
(−0.1)
thinks that if this new drug reduces this blood pressure
by 10 mmHg as compared to placebo then it should = 294
be considered as clinically significant. Let us assume
standard deviation found in previously done studies So researcher needs 294 subjects per group.
was 25 mmHg. Suppose the researcher selects the level
of significance at 5% and the power of study at 80%., So simple calculation for sample size when comparison
and he thinks suitable statistical test in this condition is for two independent groups can be done manually
124 Indian Journal of Psychological Medicine | Apr - Jun 2013 | Vol 35 | Issue 2
Charan and Biswas: Sample size calculations for different study designs
by given formulae but for more than two groups or E is 16 which lies within 10‑20 hence five rats per group
for matched data and for other complex calculations for four groups can be considered as appropriate sample
software should be used [appendix 1]. size. This is a crude method and should be used only
if sample size calculation cannot be done by power
Sample size formula for animal studies analysis method explained in above section for testing
For animal studies there are two method of calculation the hypothesis.
of sample size. The most preferred method is the
same method which has been mentioned in sample REFERENCES
size calculation for testing the hypothesis. While all
1. Jaykaran, Saxena D, Yadav P, Kantharia ND. Negative studies
efforts should be done to calculate the sample size published in medical journals of India do not give sufficient
by that method, sometimes it is not possible to get information regarding power/sample size calculation and
information related to the prerequisites needed for confidence interval. J Postgrad Med 2011;57:176‑7.
sample size calculation by power analysis like standard 2. Jaykaran, Yadav P, Kantharia ND. Reporting of sample size
and power in negative clinical trials published in Indian
deviation, effect size etc.In that condition a second
medical journals. J Pharm Negative Results 2011;2:87‑90.
method can be used this is called as “resource equation 3. Jaykaran, Saxena D, Yadav P. Negative studies published in
method”.[9] In this method a value E is calculated Indian Medical journals are underpowered. Indian Pediatr
based on decided sample size. The value if E should 2011;48:490‑1.
4. Naduvilath TJ, John RK, Dandona L. Sample size for
lies within 10 to 20 for optimum sample size. If a
ophthalmology studies. Indian J Ophthalmol [serial online]
value of E is less than 10 then more animal should be 2000;48:245. Available from: http://www.ijo.in/text.
included and if it is more than 20 then sample size asp?2000/48/3/245/14864. [Last cited in 2012 Sep 2].
should be decreased. 5. Patra P. Sample size in clinical research, the number we
need. Int J Med Sci Public Health 2012;1:5‑9.
6. Shah H. How to calculate sample size in animal studies?
E = Total number of animals – Total number of groups Natl J Physiol Pharm Pharmacol 2011;1:35‑9.
7. Kasiulevicius V, Sapoka V, Filipaviciute R. Sample size
Suppose in an animal study a researcher formed calculation in epidemiological studies. Gerontology
4 groups of animal having 8 animals each for different 2006;7:225‑31.
8. Cai J, Zeng D. Sample size/power calculation for case‑cohort
interventions then total animals will be 32 (4 × 8).
studies. Biometrics 2004;60:1015‑24.
Hence E will be 9. Festing MF, Altman DG. Guidelines for the design and
statistical analysis of experiments using laboratory
E = 32 – 4 = 28 animals. Institute for Laboratory Animal Research. ILAR J
2002;43:244‑58.
This is more than 20 hence animals should be decreased
in each group. So if researcher takes 5 rats in each group How to cite this article: Charan J, Biswas T. How to calculate sample
then E will be size for different study designs in medical research?. Indian J Psychol
Med 2013;35:121-6.
Source of Support: Nil, Conflict of Interest: None.
E = 20 – 4 = 16
Indian Journal of Psychological Medicine | Apr - Jun 2013 | Vol 35 | Issue 2 125
Charan and Biswas: Sample size calculations for different study designs
126 Indian Journal of Psychological Medicine | Apr - Jun 2013 | Vol 35 | Issue 2