AMA 4351 Statistical Epidemiology
AMA 4351 Statistical Epidemiology
AMA 4351 Statistical Epidemiology
Course Content
Data organization, statistical methods; selected topics include probability, measures of
association and risk, sample size and power, calculations, meta-analyses, matched-design
analysis, logistic regression, Poisson regression, survival analysis and regression techniques for
correlated outcomes. SAS/R/STATA/SPSS programs will be used to demonstrate the statistical
procedures for analyzing real data.
1
Introduction
Epidemiology is the basic quantitative science of public health defined as below;
“The study of the distribution & determinants of health-related states or events in specified
populations, and the application of this study to the control of health problems” Dictionary of
Epidemiology (Porto & Last, 2008)
This definition of epidemiology includes several terms which reflect some of the important
principles of the discipline.
Study. Epidemiology is a scientific discipline, sometimes called “the basic science of public
health.” It has, at its foundation, sound methods of scientific inquiry.
Distribution. Epidemiology is concerned with the frequency and pattern of health events in a
population. Frequency includes not only the number of such events in a population, but also the
rate or risk of disease in the population. The rate (number of events divided by size of the
population) is critical to epidemiologists because it allows valid comparisons across different
populations.
Pattern refers to the occurrence of health-related events by time, place, and personal
characteristics.
• Time characteristics include annual occurrence, seasonal occurrence, and daily or even
hourly occurrence during an epidemic.
• Place characteristics include geographic variation, urban-rural differences, and location
of worksites or schools.
• Personal characteristics include demographic factors such as age, race, sex, marital
status, and socioeconomic status, as well as behaviors and environmental exposures.
This characterization of the distribution of health-related states or events is one broad aspect
of epidemiology called descriptive epidemiology. Descriptive epidemiology provides the What,
Who, When, and Where of health-related events.
Determinants. Epidemiology is also used to search for causes and other factors that
2
influence the occurrence of health-related events. Analytic epidemiology attempts to provide the
Why and How of such events by comparing groups with different rates of disease occurrence and
with differences in demographic characteristics, genetic or immunologic make-up, behaviors,
environmental exposures, and other so-called potential risk factors. Under ideal circumstances,
epidemiologic findings provide sufficient evidence to direct swift and effective public health
control and prevention measures.
Application. Epidemiology is more than “the study of.” As a discipline within public health,
epidemiology provides data for directing public health action. However, using epidemiologic
data is an art as well as a science. Consider again the medical model used above: To treat a
patient, a clinician must call upon experience and creativity as well as scientific knowledge.
Similarly, an epidemiologist uses the scientific methods of descriptive and analytic epidemiology
in “diagnosing” the health of a community, but also must call upon experience and creativity
3
when planning how to control and prevent disease in the community.
Place: these include climate, geology, presence of agents/vectors, geology, population density,
economic development, nutritional practices, medical practices etc
Time: these include calendar time, time since an event, seasonality, temporal trends etc
Cross-sectional studies
In a cross-sectional study, we measure the frequency of a particular exposure(s) and / or outcome(s)
in a defined population at a particular point in time. As cross-sectional studies collect data on
existing (prevalent) cases, they are sometimes called prevalence studies. In a descriptive cross-
sectional study, we simply describe the frequency of the exposure(s) or outcome(s) in a defined
population. In an analytic cross-sectional study, we simultaneously collect information on both the
outcome of interest and the potential risk factor(s) in a defined population. We then compare the
prevalence of the outcome in the people exposed to each risk factor with the prevalence in those
not exposed.
Cross-sectional studies are analyzed using risk ratios or odds ratios as measure of effect because
time does not matter.
Case-control studies
Case-control studies provide a relatively simple way to investigate causes of diseases, especially
rare diseases. When we use a case-control study to investigate the association between an
exposure and an outcome, we start by identifying individuals with the outcome of interest
(cases), and compare them to individuals without this outcome (controls). We obtain information
about one or more previous exposures from cases and controls, and compare the two groups to
4
see if each exposure is significantly more (or less) frequent in cases than in controls. The case-
control method (in particular, the method of analysis) was developed in the early 1950s as an
approach to the problem of investigating risk factors for diseases with long latent periods (a long
period of time between the exposure and the disease outcome), where cohort studies are
impractical.
Case-control studies are longitudinal, in contrast to cross-sectional studies. Case-control studies
have been called retrospective studies since the investigator is looking backward from the disease
to a possible cause. This can be confusing because the terms retrospective and prospective are also
used to describe the timing of data collection in relation to the current date. In this sense a case-
control study may be either retrospective, when all the data deal with the past, or prospective, in
which data collection continues with the passage of time.
Selection of Controls
This is probably the most difficult part about designing a case-control study. The key point is
that controls must be representative of the population that produced the cases, but must not have
the outcome in question. This is the main limitation of a case-control study (selection bias). To
overcome this bias, we:
a) Include more than one group of controls
Ideally, we should have one control group which represents the population which produced
the cases. However, this may not always be easy to achieve, and sometimes it may be useful
to have more than one group of controls.
5
b) How many controls per case?
There may be advantages in having more than one control (of the same type) per case. In a
case-control study we compare the prevalence of exposure in cases and controls. We can
improve the statistical precision of this estimate by increasing the number of controls.
Cohort studies
Cohort studies start by measuring exposure to a risk factor of interest. Individuals are classified
by their exposure status, and then followed over a period of time to see whether they develop one
or more outcomes. Variables of interest are specified and measured and the whole cohort is
followed up to see how the subsequent development of new cases of the disease (or other
outcome) differs between the groups with and without exposure. Because the data on exposure
and disease refer to different points in time, cohort studies are longitudinal.
Cohort are analyzed using rates and rate ratio to account for the follow-up time.
6
Measuring Disease Occurrence
Epidemiology studies the distribution of diseases in populations and factors related to them.
Examples
In a population of 1000 there are 50 cases of malaria:
p = 50/1000 = 0.05 or 5%.
Epidemiological terminology
In epidemiology, disease occurrence is frequently small relative to the population size.
Therefore, the proportion figures are multiplied by an appropriate number such as 10,000. In the
above second example, we have a prevalence of 4 per 10,000 persons.
Exercise
In a county with 2300 inhabitant there have occurred 2 cases of leukemia. What is the prevalence
of leukemia per 100,000 persons?
7
Quantitative Aspects:
What is Variance and Confidence Interval for the Prevalence?
For a sample
8
ˆp is approx. normal (Remember var of binomial distribution is
).
9
Example
Exercise
In a county with 2300 inhabitants there have occurred 2 cases of leukemia. What is the
prevalence with 95% CI?
Examples
In a malaria-free population of 1000 there are four new cases of malaria within one year: I =
4/1000 = 0.004 or 0.4%.
In a skin-cancer free population of 10,000 there are 11 new cases of skin cancer: I = 11/10, 000 =
0.0011 or 0.11%.
Exercise
In a rural county with 2000 children within pre-school age there have occurred 15 new cases of
leukemia within 10 years. What is the incidence?
Quantitative Aspects: How to determine Variance and Confidence Interval for the
Incidence?
10
Computing Variance of Incidence
11
Consequently,
Example
12
Exercise
In a rural county with 2000 children within pre-school age there have occurred 15 new cases of
leukemia within 10 years. What is incidence with 95% CI?
Examples
A cohort study is conducted to evaluate the relationship between dietary fat intake and the
development of prostate cancer in men. In the study, 100 men with high fat diet are compared with
100 men who are on low fat diet. Both groups start at age 65 and are followed for 10 years. During
the follow-up period, 10 men in the high fat intake group are diagnosed with prostate cancer and
5 men in the low fat intake group develop prostate cancer.
The incidence density is cID = 10/(1, 000) = 0.01 in the high fat intake group and cID = 5/(1, 000)
= 0.005 in the low fat intake group.
13
Examples
Consider a population of n = 5 factory workers with X2 = 1 and all other Xi = 0(here the disease
incidence might be a lung disease). During follow-up, there was only one case of lung cancer.
We have also T1 = 12, T2 = 2, T3 = 6, T4 = 12, T5 = 5, so that
14
Example
15
Measures of effect: Risk Difference
Example 1
Example 2
16
Exercise
17
Variance of RD
18
Example 1
Interpretation:
19
Example 1
This means, the risk of developing caries was 60% lower among the children using the new
toothpaste.
Example 2
This means, person with sun-exposure were 40 times likely to develop skin cancer compared to
the non-exposed.
Exercise 1
20
Estimator of RR
Variance of RR
δ (delta-method: https://en.wikipedia.org/wiki/Delta_method ).
21
A confidence interval for RR
Example
22
Odds
The odds of an outcome is the number of times the outcome occurs to the number of times it
does not.
Suppose that p is the probability of the outcome, then
It follows that
Odds Ratio
23
Examples
Example
24
A confidence interval for OR
25
Generalized linear models
Generalized Linear Model (GLiM, or GLM) is an advanced statistical modelling technique
formulated by John Nelder and Robert Wedderburn in 1972. It is an umbrella term that
encompasses many other models, which allows the response variable y to have an error
distribution other than a normal distribution. The models include Linear Regression, Logistic
Regression, and Poisson Regression.
Examples
26
Introduction to Logistic Regression
27
Properties of the Logit
28
Interpretation of parameters α and β
29
The logistic regression model
30
Compute Odds Ratio with 95%CI
31
The regression coefficient are in log form, either exponentiate or use the “epiDisplay”
package to display the Odds ratios and 95%CI.
32
Multivariable logistic regression models
33
Conditional logistic regression in matched case-control studies
34
The conditional logistic regression model can then be specified as below:
Example
We will use data from Infertility after Spontaneous and Induced Abortion study.
35
Let run a conditional logistic regression to answer the question: what is the effect of previous
spontaneous abortions and induced abortions on odds of being infertile?
From the conditional logistic regression model, both the two exposures were positively
associated with odds of being infertile.
Poisson regression
36
Example
37
38
Inference about Model Parameters
Example 2
Lung cancer deaths in British male physicians (Frome, 1983).
40
What is the incidence rate ratio of smoking for 20 to 24 years versus 15 to 19 years?
41
From the regression model, years of smoking >30 years compared to 15 to 19 were significantly
associated with high rate of lung cancer mortality (all P-values <0.05 and the 95% CI does not
include one).
The 95%CI are very wide suggesting very imprecise estimates probably because of lack of
power (low sample size).
b) What is the effect of years of number of cigarettes smoked on risk death from lung
cancer?
42
From the results, we can see smoking one and above cigarette compared to not smoking, were
associated with higher rate of lung cancer mortality.
Would the effect of number of cigarettes smoked and years of smoking change if run a
multivariable model with both exposure variables? Would the model performance include?
43
From the results above, we see the model performance based on the AIC (Akaike information
criterion) has improved (the AIC has dropped from 444.08 in model 1 above to 201.31 in this
model).
44
We see from the multivariable model, smoking 1 to 9 cigarettes is not associated with lung
cancer mortality any more. The IDR for years smoked are adjusted for number of cigarettes
smoked and vice versus.
Survival analysis
In logistic regression, we were interested in studying how risk factors were associated with
presence or absence of disease. Sometimes, though, we are interested in how a risk factor or
treatment affects time to disease or some other event. Or we may have study dropout, and therefore
subjects who we are not sure if they had disease or not. In these cases, logistic
regression is not appropriate.
Survival analysis is used to analyze data in which the time until the event is of interest. The
response is often referred to as a failure time, survival time, or event time. Examples: time to death
after diagnosis of cancer or HIV, time to relapse after malnutrition treatment, time to recovery
from malaria etc.
Survival analysis is used to analyze the rates of occurrence of events over time, without
assuming the rates are constant. Generally, survival analysis allows for
The survival time response is usually continuous, but May be incompletely determined for some
subjects – i.e.- For some subjects we may know that their survival time was at least equal to
sometime t. Whereas, for other subjects, we will know their exact time of event. Incompletely
observed responses are censored. The survival time Is always ≥ 0.
Censoring is present when we have some information about a subject’s event time, but we don’t
know the exact event time. For the analysis methods we will discuss to be valid, censoring
mechanism must be independent of the survival mechanism.
There are generally three reasons why censoring might occur:
• A subject does not experience the event before the study ends.
• A person is lost to follow-up during the study period
• A person withdraws from the study
These are all examples of right-censoring.
Right censoring occurs when a subject leaves the study before an event occurs, or the study
ends before the event has occurred. For example, we consider patients in a clinical trial to study
the effect of treatments on stroke occurrence. The study ends after 5 years. Those patients who
have had no strokes by the end of the year are right censored.
Left censoring is when the event of interest has already occurred before enrolment. This is very
rarely encountered.
46
Regardless of the type of censoring, we must assume that it is non-informative about the event;
that is, the censoring is caused by something other than the impending failure.
Survival Function
Kaplan-Meier Curve
47
Kaplan–Meier method estimates the survival probability from the observed survival times (both
censored and uncensored). Survival probability is plotted against time t in Kaplan–Meier
survival curve. The survival curve is useful for understanding the median survival time (time at
which survival probability is 50%).
Kaplan–Meier method is suitable for simple survival analysis and does not consider other
independent variables (confounding factors) while analyzing survival curves. If there are other
confounding factors that you want to include in model, you should use Cox proportional hazards
(PH) model (Cox regression).
One of the objectives in survival analysis is to compare the survival functions in different groups,
e.g., leukemia patients as compared to cancer-free controls. If both groups were followed until
everyone died, both survival curves would end at 0%. However, one group might have survived
on average a lot longer than the other group. Survival analysis addresses this problem by
comparing the hazard at different times over the observation period. Proportional hazards
assumption says that the ratio of hazards between groups is constant over time.
CPH model uses the hazard function instead of survival probabilities or survival time. The
hazard function is a measure of effect in CPH model.
48
Note that the Cox Proportional regression model makes two assumptions:
a) Non-informative censoring i.e the censoring is independent of the event.
b) Proportion hazard i.e constant HR over the time.
c) There should be a linear relationship between the log of hazard ratio and independent
variables
d) The independent variables should be independent of survival times i.e. independent
variables should not change with time
Hazard ratio (HR) is similar to relative risk and is described as the ratio of hazard rate or failure
rate in two treatment groups (e.g. treated vs. control group). The HR = 1 indicates that there are
no differences between the two groups. HR > 1 indicates that the event of interest is most likely
to occur and vice versa.
49
Log-rank test
Log-rank test is a nonparametric hypothesis test, which compares two or more survival curves
(i.e. survival times for two different condition groups).
Log-rank test calculates the test statistics by comparing the observed number of events to the
expected number of events in underlying condition groups,
Log-rank test value is compared against the critical value from χ2 distribution with n-1 degree of
freedoms. The drawback of Log-rank test is that it does not analyze other independent variables
affecting the survival time.
Lets fit a survival analysis model in R using NCCTG Lung Cancer Data.
50
The survfit function fits the survival model and returns the number of event, median follow-up
time (95%CI). Of the 228cancer patient, 165 died after median follow-up of 310 (95%CI 285 to
363) days.
51
We see the first death occurred after 5 days, and the survival function at day 5 was 0.9956. The
next death occurred on day 11 (3 deaths) and the survival function was 0.9825.
Next we ask the question: was the survival time similar between males vs female (sex)?
a) We will use the log-rank test to compare the survival times between the two groups
From the log-rank test (P=0.001), it is clear the survival time was different. The null hypothesis
tested is that the survival times are not different between the two groups. We also plot the
Kaplan-Meier Curve to demonstrate this.
52
What was the effect of sex on hazard of cancer death?
Female had 41% lower hazard of cancer death (Hazard ratio of 0.588 (95%CI 0.42 to 0.82),
P=0.001.
We can also test if the proportional hazard assumption was met using Schoenfeld residuals.
53
The is no evidence against the proportional hazard assumption.
If your data points are correlated, this assumption of independence is violated. Fortunately, there
are still ways to produce a valid regression model with correlated data.
Correlated Data
Correlation in data occurs primarily through multiple measurements (e.g. two measurements are
taken on each participant 1 week apart, and data points within individuals are not independent) or
if there is clustering in the data (e.g. a survey is conducted among students attending different
schools, and data points from students within a given school are not independent).
54
The result is that that the outcome has been measured on the level of an individual observation,
but that there is a second level of either an individual (in the case of multiple time points) or
clusters on which individual data points can be correlated. Ignoring this correlation means that
standard error cannot be accurately computed, and in most cases will be artificially low.
The standard regression models, compute Standard error (SE) using the number of individuals in
a study
The correlation induced for whatever reason between individuals in a community can be
measured by the intra-cluster correlation, ρ('rho') (also called within-cluster correlation).
ρ=0 means that responses of individuals within the same cluster are no more alike than
those of individuals from different clusters.
Ρ=1 means that all responses of individuals within the same cluster are identical.
An alternative way of thinking about intra-cluster correlation is in terms of between-
cluster variation.
As previously mentioned, simple regression will produce inaccurate standard errors with
correlated data and therefore should not be used.
Instead, you want to use models that can account for the correlation that is present in your data.
If the correlation is due to some grouping variable (e.g. school) or repeated measures over time,
then you can choose between Generalized Estimating Equations or Multilevel Models. These
modeling techniques can handle either binary or continuous outcome variables, so can be used to
replace either logistic or linear regression when the data are correlated.
55
Variance
The ri terms are the residuals.
The residuals are the difference between the outcome observed and the outcome predicted
by the model. When observations are independent, then the summation is performed on the
individual-level residuals. If data are "clustered", then cluster-level residuals are calculated
and summed over the clusters.
Note: This does not make any assumptions about independence within clusters but does
assume that there is independence between clusters.
Independence
This choice implies that you don't think the data are correlated. If you don't think the data are
correlated, you probably don't need to be using GEE.
Exchangeable
This choice implies that within a "cluster", e.g. a household, any two observations are equally
correlated, but that there is no correlation between observations from different "clusters". This is
a common choice.
Autocorrelation
This choice is useful for measures repeated over time, e.g. repeated measurements on the same
individual such as episodes of diarrhoea. Repeated measurements on an individual are most likely
to be most strongly correlated when they are made a short time apart. The greater the time interval
between two measurements the smaller the correlation is likely to be.
Summary
The main aspects of GEE analysis are:
1. GEE can include robust standard errors.
2. You need to specify how you think the data are correlated. The usual choice is 'exchangeable'.
56
3. If an exchangeable correlation is specified, point estimates, e.g., odds ratio, rate ratio, are
adjusted for correlations in the data.
Example
We are going to use a case-control study conducted in nine sites (clusters). The outcome of
interest is being a case (dead) or control (alive). There are many exposure variables that we need
to find out if they are associated with being a case adjusting for the clustering in each site. We
assume that patients from each site are different from other sites. So if we ignoring this
clustering, we will get invalid measure of effect (odds ratios). We will first run a glm logistic
regression model ignoring the clustering, then adjust the clustering using robust standard errors.
Later we will run the gee and the multilevel (rand effect) model. Let start by exploring the data;
57
On the column we have the sites (clusters) and on the rows we have some of the exposures and
the outcome (adm_dead).
Let start by running a multivariable logistic regression ignoring the clustering within the sites.
From the output, we see urban, abc and hiv exposures were significantly associated with being a
case (P-values<0.05 with *). Can you go ahead and estimate the odds ratios plus 95%CI?
58
For the exposure variable urban, the Odds ratio is 0.4292 (exp(-0.8459) and the 95%CI are from
0.2683 to 0.6865 as shown above.
Let now run the same model but use the cluster robust standard errors
Note the estimated coefficients don’t change but the standard errors (now we have robust
standard errors corrected at each site), the z-value has changed and the P-values have changed
too because of the changes on std.error (z value=std.error/coefficients value). Note also now the
variable urban is now not significantly associated with being a case.
We will now run the same model but now use the gee model to correct for the clustering
59
Look at the results, now only HIV is associated with being a case, the coefficients and std.err are
different.
The random effect model explicitly models the clustering and provides the variance across the
sites (0.582). This is the most robust method. I recommend using this approach.
Notice it is the abc and HIV variables that are significantly associated with being a case. Below
is summary of the regression coefficients plus std.erros for all the models
Exposure Naïve model Model with robust std Gee models Random effect model
variable errors
Coefficients Std.error Coefficients Std.error Coefficients Std.error Coefficients Std.error
60
Intercept -0.24 0.5488 -0.24 0.6959 0.6348 0.2585* -0.2748 0.7010
Sex -0.1115 0.2363 -0.1115 0.1854 -0.0712 0.1162 -0.1919 0.2504
Age -0.036 0.0225 -0.036 0.0232 -0.0011 0.0007 -0.0454 0.0241
Urban -0.8459 0.2397* -0.8459 0.5060 -0.0819 0.07455 -0.6699 0.5788
Abc_level1 0.7159 0.3520* 0.7159 0.3615* 0.0019 0.0718 0.7599 0.3707*
Abc_level2 1.6955 0.3008* 1.6955 0.3197* 0.00219 0.11899 1.732 0.3184*
HIV_level1 1.7129 0.5320* 1.7129 0.3807* 0.3317 0.1033* 1.4673 0.5519*
HIV_level2 0.8712 0.4078* 0.8712 0.3695* 0.13899 0.1077 0.533 0.4423
HIV_level9 -0.3785 0.8254 -0.3785 0.9445 -0.7504 0.249* -0.4927 0.8581
Meta-analysis
It is now common for important clinical questions in medical research to be addressed in several
studies. This can be confusing for a medical practitioner. Which studies' results should be
followed? How can the information from the mass of data published be summarised? Meta-
analysis is a quantitative tool that can be used to summarise information from many studies. An
informal literature review can be too subjective and misleading, whereas meta-analysis can assist
with an overall conclusion. This combines results from different studies to give an overall
summary estimate (and confidence interval). The concept of meta-analysis is fairly easy to
understand. The purpose of meta-analysis is to summarise information from different studies.
Definition of meta-analysis (from Glass, 1976): The statistical analysis of a large collection of
analysis results for the purpose of integrating the findings.
Meta-analysis provides the highest level of evidence in medical research as shown below.
61
An important step in a systematic review is the thoughtful consideration of whether it is appropriate
to combine the numerical results of all, or perhaps some, of the studies. Such a meta-analysis
yields an overall statistic (together with its confidence interval) that summarizes the effectiveness
of an experimental intervention compared with a comparator intervention. Potential advantages of
meta-analyses include the following:
1. To improve precision. Many studies are too small to provide convincing evidence about
intervention effects in isolation. Estimation is usually improved when it is based on more
information.
2. To answer questions not posed by the individual studies. Primary studies often involve a
specific type of participant and explicitly defined interventions. A selection of studies in
which these characteristics differ can allow investigation of the consistency of effect
across a wider range of populations and interventions. It may also, if relevant, allow
reasons for differences in effect estimates to be investigated.
3. To settle controversies arising from apparently conflicting studies or to generate new
hypotheses. Statistical synthesis of findings allows the degree of conflict to be formally
assessed, and reasons for different results to be explored and quantified.
Of course, the use of statistical synthesis methods does not guarantee that the results of a review
are valid, any more than it does for a primary study. Moreover, like any tool, statistical methods
can be misused.
Principles of meta-analysis
The commonly used methods for meta-analysis follow the following basic principles:
where Yi is the intervention effect estimated in the ith study, Wi is the weight given to the
ith study, and the summation is across all studies. Note that if all the weights are the same
then the weighted average is equal to the mean intervention effect. The bigger the weight
given to the ith study, the more it will contribute to the weighted average.
62
3. The combination of intervention effect estimates across studies may optionally
incorporate an assumption that the studies are not all estimating the same intervention
effect, but estimate intervention effects that follow a distribution across studies. This is
the basis of a random-effects meta-analysis. Alternatively, if it is assumed that each study
is estimating exactly the same quantity, then a fixed-effect meta-analysis is performed.
4. The standard error of the summary intervention effect can be used to derive a confidence
interval, which communicates the precision (or uncertainty) of the summary estimate; and
to derive a P value, which communicates the strength of the evidence against the null
hypothesis of no intervention effect.
5. As well as yielding a summary quantification of the intervention effect, all methods of
meta-analysis can incorporate an assessment of whether the variation among the results
of the separate studies is compatible with random variation, or whether it is large enough
to indicate inconsistency of intervention effects across studies.
6. The problem of missing data is one of the numerous practical considerations that must be
thought through when undertaking a meta-analysis. In particular, review authors should
consider the implications of missing outcome data from individual participants (due to
losses to follow-up or exclusions from analysis).
A very common and simple version of the meta-analysis procedure is commonly referred to as
the inverse-variance method. This approach is implemented in its most basic form in RevMan,
and is used behind the scenes in many meta-analyses of both dichotomous and continuous data.
The inverse-variance method is so named because the weight given to each study is chosen to be
the inverse of the variance of the effect estimate (i.e. 1 over the square of its standard error).
Thus, larger studies, which have smaller standard errors, are given more weight than smaller
studies, which have larger standard errors. This choice of weights minimizes the imprecision
(uncertainty) of the pooled effect estimate.
63
• Calculates a summary estimate by calculating a weighted average of the individual study
estimates.
So, we obtain a weighted average of the study estimates.
A fixed-effect meta-analysis using the inverse-variance method calculates a weighted average as:
where Yi is the intervention effect estimated in the ith study, SEi is the standard error of that
estimate, and the summation is across all studies. The basic data required for the analysis are
therefore an estimate of the intervention effect and its standard error from each study. A fixed-
effect meta-analysis is valid under an assumption that all effect estimates are estimating the same
underlying intervention effect, which is referred to variously as a ‘fixed-effect’ assumption, a
‘common-effect’ assumption or an ‘equal-effects’ assumption. However, the result of the meta-
analysis can be interpreted without making such an assumption.
This is large if the average distance between the individual study effects and the summary effect
is large.
This statistic is referred to the χ² distribution,
• if statistically significant then there is evidence against the null hypothesis of a common effect
for all studies.
• if not statistically significant, there is no evidence for a heterogenous effect across trials. ie we
would conclude that there is a common effect (homogeneity) across trials.
You do not have to do this calculation by hand – statistics packages that perform meta-analysis
will include a test for heterogeneity
64
• the variance is used to modify the study weights
The formula for the random-effects summary estimate is similar to that for the fixed-effects
summary estimate. The difference is the weighting. The weights include a between study
variance. This is complex and is not discussed here, you only need to know that this variance is
taken into account and that is how the random effects method differs from the fixed effects
method.
Random effects
• assumes the true effect differs between studies
• the true effects vary randomly about the population 'average'
• the between-study variance needs to be estimated
• weights for each trial incorporate the between-study variance
Examples
Let use data from this study:
65
To conduct meta-analysis in R we need the package: metafor
66
We start by fixing a fixed effect meta-analysis model (assuming no heterogeneity).
The I2 tells use the % of heterogeneity, in this case it was 0% meaning there was no variability
across the studies.
There is a formal statistical test for heterogeneity using chi-square test, in this case it was 1.24
and the P=0.74. The null hypothesis test is that there no heterogeneity, since the P-value=0.74,
we have no evidence to reject the null hypothesis we therefore conclude there is no
heterogeneity. Therefore, the assumption no heterogeneity by fixe-effect method holds. Then we
can use the fixed effect method in this case.
From the results above, the pooled effect is -0.16 (this is in log format). To get the effect as risk
ratio, we exponentiate this value
We can also write a code to get the 95%CI for the pooled estimate
67
The weights displayed are calculated using the inverse-variance method discussed earlier. Studies
that have very low variances (likely large sample sizes and very precise) have the largest weight
i.e like in this case the IDEAL study with 37% weight. A forest plot displays effect estimates and
confidence intervals for both individual studies and meta-analyses. Each study is represented by a
block at the point estimate of intervention effect with a horizontal line extending either side of the
block. The area of the block indicates the weight assigned to that study in the meta-analysis while
the horizontal line depicts the confidence interval (usually with a 95% level of confidence). The
area of the block and the confidence interval convey similar information, but both make different
contributions to the graphic. The confidence interval depicts the range of intervention effects
compatible with the study’s result. The size of the block draws the eye towards the studies with
larger weight (usually those with narrower confidence intervals), which dominate the calculation
of the summary result, presented as a diamond at the bottom.
Note when there is no evidence of heterogeneity (P≥0.05), both fixed effect and random effect
methods yields the same pooled measure (Use the fixed effect). When the test of heterogeneity P-
value <0.05, there is evidence of heterogeneity, use the random effect method.
In the above example since test of heterogeneity P≥0.05, you should use fixed effect method,
but the random effect will yield similar results as shown below.
68
Sample size and power, calculations
It is important to ensure at the design stage that the proposed number of subjects to be recruited
into any study will be appropriate to answer the main objective(s) of the study. A small study
may fail to detect important effects on the outcomes of interest, or may estimate them too
imprecisely, no matter how good its design may be in other respects. A study larger than
necessary, while less common in practice, may waste valuable resources.
The sample size calculation is generally required at the study design stage, before patient
enrolment has begun. There are several reasons for this.
• Firstly, from a scientific perspective, testing too few might lead to failure to detect an
important effect, whereas testing too many might lead to detecting a statistically significant
yet clinically insignificant effect.
• Secondly, from an ethical viewpoint, testing too many subjects can lead to unnecessary
harm or potentially unnecessary sacrifice in the case of animal studies. Conversely, testing
too few is also unethical, as an underpowered study might not contribute to the evidence-
based field of medicine.
• Thirdly, from an economical perspective, testing too many will lead to unnecessary costs
and testing too few will be potentially wasteful if the trial is unable to address the scientific
question of interest.
For this reason, many funders and institutional review boards require an a priori sample size
calculation, which is included in the study protocol. Adaptive trial designs, whereby prespecified
69
modifications can be made to the trial after its inception, can potentially improve flexibility and
efficiency.
Components
There are four principal components required to calculate the sample size (Table below). These
components are specified via parameters. Working under a hypothesis testing framework, we
assume a null hypothesis (H0) and an alternative hypothesis (H1). In practice, we do not know the
‘truth’, so we base our inferences on a statistical test applied to a random sample from the
population.
Two types of error can occur. The first is a Type I error, where the null hypothesis is true, but we
incorrectly reject it. The second is a Type II error, where the null hypothesis is false, but we
incorrectly fail to reject it. Specification of the Type I (denoted as a) and Type II (denoted as b but
more commonly reported as the complement: the power = 1- b) error rate parameters are required
for the sample size calculation. Conventional choices are α = 0.05 and 0.01 (corresponding to a
significance level of 5% and 1%, respectively) and b = 0.2 and 0.1 (corresponding to 80% and
90% power, respectively). However, there are situations in which these parameters might be
increased or decreased, depending on the clinical context of the study.
The minimal clinically relevant difference is the smallest difference in outcome between the study
groups, or effect size, that is of scientific interest to the investigator.
Power Analysis in R
70
We demonstrate how to estimate sample size for two proportions
Example
Suppose we want to randomly sample male and female college undergraduate students and ask
them if they consume alcohol at least once a week. Our null hypothesis is no difference in the
proportion that answer yes. Our alternative hypothesis is that there is a difference. This is a two-
sided alternative; one gender has higher proportion but we don't know which. We would like to
detect a difference as small as 5%. How many students do we need to sample in each group if we
want 80% power and a significance level of 0.05?
Solution
The study power is 80%, the α=5%, and the effect size is 5% (p1=0.5, P2=0.55)
Example 2
Suppose in the above example, we already know the proportion of male=10% and female=5%.
What would be the required sample size?
Solution
71
Comparing means between groups
Example
Suppose a certain food supplement is being tested on children with severe acute malnutrition. The
researcher hypothesed the food supplement would change the weight-for-length z score from a
mean of -3.2 to -2.7 (an improvement of 0.5 z scores). Assuming a common standard deviation of
1.5, power of 80% and alpha of 5%, what would be the required sample size?
Solution
Delta=0.5 (change from -3.2 to -2.7)
Common sd=1.5, Power=0.8, α=0.05
Therefore, the researcher will require a total of 286 children (~143 in each group).
72