AMA 4351 Statistical Epidemiology

Download as pdf or txt
Download as pdf or txt
You are on page 1of 72

Course outlines

AMA 4351: Statistical Epidemiology


Contact hours: 45
Prerequisites: AMA 4306 Theory of Estimation
Purpose of the Course
To equip the student with statistical techniques in epidemiology.

Expected Learning Outcomes of the Course


By the end of the course unit, the student should be able to:
i. Organize epidemiological data into analyzable form
ii. Formulate appropriate statistical method for an epidemiological study outcome
iii. Apply statistical model using appropriate statistical software for epidemiological
data.
iv. Interpret results of an epidemiological study

Course Content
Data organization, statistical methods; selected topics include probability, measures of
association and risk, sample size and power, calculations, meta-analyses, matched-design
analysis, logistic regression, Poisson regression, survival analysis and regression techniques for
correlated outcomes. SAS/R/STATA/SPSS programs will be used to demonstrate the statistical
procedures for analyzing real data.

1
Introduction
Epidemiology is the basic quantitative science of public health defined as below;

“The study of the distribution & determinants of health-related states or events in specified
populations, and the application of this study to the control of health problems” Dictionary of
Epidemiology (Porto & Last, 2008)

This definition of epidemiology includes several terms which reflect some of the important
principles of the discipline.

Study. Epidemiology is a scientific discipline, sometimes called “the basic science of public
health.” It has, at its foundation, sound methods of scientific inquiry.

Distribution. Epidemiology is concerned with the frequency and pattern of health events in a
population. Frequency includes not only the number of such events in a population, but also the
rate or risk of disease in the population. The rate (number of events divided by size of the
population) is critical to epidemiologists because it allows valid comparisons across different
populations.
Pattern refers to the occurrence of health-related events by time, place, and personal
characteristics.
• Time characteristics include annual occurrence, seasonal occurrence, and daily or even
hourly occurrence during an epidemic.
• Place characteristics include geographic variation, urban-rural differences, and location
of worksites or schools.
• Personal characteristics include demographic factors such as age, race, sex, marital
status, and socioeconomic status, as well as behaviors and environmental exposures.

This characterization of the distribution of health-related states or events is one broad aspect
of epidemiology called descriptive epidemiology. Descriptive epidemiology provides the What,
Who, When, and Where of health-related events.

Determinants. Epidemiology is also used to search for causes and other factors that

2
influence the occurrence of health-related events. Analytic epidemiology attempts to provide the
Why and How of such events by comparing groups with different rates of disease occurrence and
with differences in demographic characteristics, genetic or immunologic make-up, behaviors,
environmental exposures, and other so-called potential risk factors. Under ideal circumstances,
epidemiologic findings provide sufficient evidence to direct swift and effective public health
control and prevention measures.

Health-related states or events. Originally, epidemiology was concerned with epidemics of


communicable diseases. Then epidemiology was extended to endemic communicable diseases
and non-communicable infectious diseases. More recently, epidemiologic methods have been
applied to chronic diseases, injuries, birth defects, maternal-child health, occupational health, and
environmental health. Now, even behaviors related to health and well-being (amount of exercise,
seat-belt use, etc.) are recognized as valid subjects for applying epidemiologic methods. In these
lessons we use the term “disease” to refer to the range of health-related states or events.
Specified populations. Although epidemiologists and physicians in clinical practice are both
concerned with disease and the control of disease, they differ greatly in how they view “the
patient.” Clinicians are concerned with the health of an individual; epidemiologists are
concerned with the collective health of the people in a community or other area. When faced
with a patient with diarrheal disease, for example, the clinician and the epidemiologist have
different responsibilities. Although both are interested in establishing the correct diagnosis, the
clinician usually focuses on treating and caring for the individual. The epidemiologist focuses on
the exposure (action or source that caused the illness), the number of other persons who may
have been similarly exposed, the potential for further spread in the community, and interventions
to prevent additional cases or recurrences.

Application. Epidemiology is more than “the study of.” As a discipline within public health,
epidemiology provides data for directing public health action. However, using epidemiologic
data is an art as well as a science. Consider again the medical model used above: To treat a
patient, a clinician must call upon experience and creativity as well as scientific knowledge.
Similarly, an epidemiologist uses the scientific methods of descriptive and analytic epidemiology
in “diagnosing” the health of a community, but also must call upon experience and creativity

3
when planning how to control and prevent disease in the community.

Epidemiology has two major branches:


• Descriptive epidemiology: examining the distribution of disease in a population and
observing the basic features of its distribution.
• Analytic epidemiology: investigating a hypothesis about the cause of disease by studying
how exposures relate to disease
Three essential characteristics of disease assessed in descriptive epidemiology are:
Person: these include age, gender, ethnic group, genetic predisposition, concurrent disease, diet,
physical activity, smoking, risk taking behavior, SES, education, occupation etc

Place: these include climate, geology, presence of agents/vectors, geology, population density,
economic development, nutritional practices, medical practices etc

Time: these include calendar time, time since an event, seasonality, temporal trends etc

Common study designs in Epidemiology

Cross-sectional studies
In a cross-sectional study, we measure the frequency of a particular exposure(s) and / or outcome(s)
in a defined population at a particular point in time. As cross-sectional studies collect data on
existing (prevalent) cases, they are sometimes called prevalence studies. In a descriptive cross-
sectional study, we simply describe the frequency of the exposure(s) or outcome(s) in a defined
population. In an analytic cross-sectional study, we simultaneously collect information on both the
outcome of interest and the potential risk factor(s) in a defined population. We then compare the
prevalence of the outcome in the people exposed to each risk factor with the prevalence in those
not exposed.

Cross-sectional studies are analyzed using risk ratios or odds ratios as measure of effect because
time does not matter.

Case-control studies
Case-control studies provide a relatively simple way to investigate causes of diseases, especially
rare diseases. When we use a case-control study to investigate the association between an
exposure and an outcome, we start by identifying individuals with the outcome of interest
(cases), and compare them to individuals without this outcome (controls). We obtain information
about one or more previous exposures from cases and controls, and compare the two groups to

4
see if each exposure is significantly more (or less) frequent in cases than in controls. The case-
control method (in particular, the method of analysis) was developed in the early 1950s as an
approach to the problem of investigating risk factors for diseases with long latent periods (a long
period of time between the exposure and the disease outcome), where cohort studies are
impractical.
Case-control studies are longitudinal, in contrast to cross-sectional studies. Case-control studies
have been called retrospective studies since the investigator is looking backward from the disease
to a possible cause. This can be confusing because the terms retrospective and prospective are also
used to describe the timing of data collection in relation to the current date. In this sense a case-
control study may be either retrospective, when all the data deal with the past, or prospective, in
which data collection continues with the passage of time.

Selection of Controls
This is probably the most difficult part about designing a case-control study. The key point is
that controls must be representative of the population that produced the cases, but must not have
the outcome in question. This is the main limitation of a case-control study (selection bias). To
overcome this bias, we:
a) Include more than one group of controls
Ideally, we should have one control group which represents the population which produced
the cases. However, this may not always be easy to achieve, and sometimes it may be useful
to have more than one group of controls.
5
b) How many controls per case?
There may be advantages in having more than one control (of the same type) per case. In a
case-control study we compare the prevalence of exposure in cases and controls. We can
improve the statistical precision of this estimate by increasing the number of controls.

c) Should the controls be matched to the case?


Matching is a method of controlling for confounding. It is also a way of increasing the
efficiency of a case-control study, as we will discuss in a moment. When we match controls to
cases, we select controls that are similar to each case with respect to one or more
characteristics. The characteristics most often matched are age and sex. When we conduct a
one-to-one matching, we need to use appropriate statistical analysis methods to account for the
clustering introduced (conditional logistic regression).

Case-controls are analyzed using Odds ratios only!

Cohort studies
Cohort studies start by measuring exposure to a risk factor of interest. Individuals are classified
by their exposure status, and then followed over a period of time to see whether they develop one
or more outcomes. Variables of interest are specified and measured and the whole cohort is
followed up to see how the subsequent development of new cases of the disease (or other
outcome) differs between the groups with and without exposure. Because the data on exposure
and disease refer to different points in time, cohort studies are longitudinal.

Cohort are analyzed using rates and rate ratio to account for the follow-up time.

6
Measuring Disease Occurrence
Epidemiology studies the distribution of diseases in populations and factors related to them.

This definition leads to two questions:


1. How can we measure diseases and their distributions?
• Morbidity
➢ Prevalence
➢ Incidence
• Mortality
➢ Incidence

2. How can we measure differences in disease occurrence in different populations?


• Epidemiological study
➢ Cross-sectional
➢ Case-Control
➢ Cohort
➢ Randomized clinical trial

• Epidemiological measures of effect


➢ Differences in disease risk
➢ Ratios in disease risk
➢ Relative difference in disease risk

Measuring Disease Occurrence: Prevalence


Prevalence:
is the proportion (denoted as p) of a specific population having a particular disease. p is a number
between 0 and 1. If multiplied by 100 it is percentage.

Examples
In a population of 1000 there are 50 cases of malaria:
p = 50/1000 = 0.05 or 5%.

In a population of 10,000 there are 4 cases of skin cancer:


p = 4/10, 000 = 0.0004 or 0.04%.

Epidemiological terminology
In epidemiology, disease occurrence is frequently small relative to the population size.
Therefore, the proportion figures are multiplied by an appropriate number such as 10,000. In the
above second example, we have a prevalence of 4 per 10,000 persons.

Exercise
In a county with 2300 inhabitant there have occurred 2 cases of leukemia. What is the prevalence
of leukemia per 100,000 persons?

7
Quantitative Aspects:
What is Variance and Confidence Interval for the Prevalence?
For a sample

Computing Variance of Prevalence of Xi :

8
ˆp is approx. normal (Remember var of binomial distribution is

).

Using the normal distribution for ˆp:


with 95% probability

We can show the prevalence 95%CI will be:

9
Example

Exercise
In a county with 2300 inhabitants there have occurred 2 cases of leukemia. What is the
prevalence with 95% CI?

Measuring Disease Occurrence: Incidence


Incidence: is the proportion (denoted as I ) of a specific, disease-free population developing a
particular disease in a specific study period. I is a number between 0 and 1. If multiplied by 100
it is percentage.

Examples
In a malaria-free population of 1000 there are four new cases of malaria within one year: I =
4/1000 = 0.004 or 0.4%.
In a skin-cancer free population of 10,000 there are 11 new cases of skin cancer: I = 11/10, 000 =
0.0011 or 0.11%.

Exercise
In a rural county with 2000 children within pre-school age there have occurred 15 new cases of
leukemia within 10 years. What is the incidence?

Quantitative Aspects: How to determine Variance and Confidence Interval for the
Incidence?

10
Computing Variance of Incidence

11
Consequently,

95% confidence interval for the incidence

Example

12
Exercise
In a rural county with 2000 children within pre-school age there have occurred 15 new cases of
leukemia within 10 years. What is incidence with 95% CI?

Measuring Disease Occurrence: Incidence Density


Incidence Density: is the rate (denoted as ID) of a specific, disease-free population developing a
particular disease with respect to a specific study period of length T. ID is a positive number, but
not necessarily between 0 and 1.

Estimating incidence density

Examples

A cohort study is conducted to evaluate the relationship between dietary fat intake and the
development of prostate cancer in men. In the study, 100 men with high fat diet are compared with
100 men who are on low fat diet. Both groups start at age 65 and are followed for 10 years. During
the follow-up period, 10 men in the high fat intake group are diagnosed with prostate cancer and
5 men in the low fat intake group develop prostate cancer.

The incidence density is cID = 10/(1, 000) = 0.01 in the high fat intake group and cID = 5/(1, 000)
= 0.005 in the low fat intake group.

Most useful generalization


Occurs if persons are different times under risk and hence contributing differently to the person–
time–denominator.

Estimating incidence density with different risk-times

13
Examples
Consider a population of n = 5 factory workers with X2 = 1 and all other Xi = 0(here the disease
incidence might be a lung disease). During follow-up, there was only one case of lung cancer.
We have also T1 = 12, T2 = 2, T3 = 6, T4 = 12, T5 = 5, so that

Interpretation of incidence density:

Quantitative Aspects for the Incidence Density

14
Example

15
Measures of effect: Risk Difference

Example 1

Example 2

16
Exercise

Distribution of number of diseased

17
Variance of RD

A confidence interval for RD

18
Example 1

Measures of effect: Risk Ratio and Odds Ratio


Risk ratio (RR):

Interpretation:

19
Example 1

This means, the risk of developing caries was 60% lower among the children using the new
toothpaste.

Example 2

This means, person with sun-exposure were 40 times likely to develop skin cancer compared to
the non-exposed.

Exercise 1

20
Estimator of RR

Variance of RR

δ (delta-method: https://en.wikipedia.org/wiki/Delta_method ).

21
A confidence interval for RR

Example

22
Odds
The odds of an outcome is the number of times the outcome occurs to the number of times it
does not.
Suppose that p is the probability of the outcome, then

It follows that

Odds Ratio

23
Examples

Example

24
A confidence interval for OR

25
Generalized linear models
Generalized Linear Model (GLiM, or GLM) is an advanced statistical modelling technique
formulated by John Nelder and Robert Wedderburn in 1972. It is an umbrella term that
encompasses many other models, which allows the response variable y to have an error
distribution other than a normal distribution. The models include Linear Regression, Logistic
Regression, and Poisson Regression.

Examples

26
Introduction to Logistic Regression

27
Properties of the Logit

28
Interpretation of parameters α and β

29
The logistic regression model

Examples from R software

Compute Risk ratio with 95%CI

30
Compute Odds Ratio with 95%CI

Run GLM model for logistic regression

31
The regression coefficient are in log form, either exponentiate or use the “epiDisplay”
package to display the Odds ratios and 95%CI.

Logistic regression with more than one exposure variable

32
Multivariable logistic regression models

33
Conditional logistic regression in matched case-control studies

Conditional logistic regression is appropriate for (individually) matched case-control data. It is


usually not appropriate for frequency matched case control data, which should be analyzed using
ordinary logistic analysis with stratum as a covariate. Note that matching is done to avoid
selection bias of the controls by creating controls that are similar to the cases save for the
outcome.

34
The conditional logistic regression model can then be specified as below:

Example
We will use data from Infertility after Spontaneous and Induced Abortion study.

35
Let run a conditional logistic regression to answer the question: what is the effect of previous
spontaneous abortions and induced abortions on odds of being infertile?

From the conditional logistic regression model, both the two exposures were positively
associated with odds of being infertile.

Poisson regression

36
Example

37
38
Inference about Model Parameters

From the above example

Example 2
Lung cancer deaths in British male physicians (Frome, 1983).

A data frame with 63 observations on the following 4 variables. (?lung.cancer)

• years.smok a factor giving the number of years smoking


• cigarettes a factor giving cigarette consumption
• Time man-years at risk
• y number of deaths
39
a) We can answer the question, what is the effect of years of smoking on risk death
from lung cancer?

40
What is the incidence rate ratio of smoking for 20 to 24 years versus 15 to 19 years?

41
From the regression model, years of smoking >30 years compared to 15 to 19 were significantly
associated with high rate of lung cancer mortality (all P-values <0.05 and the 95% CI does not
include one).

The 95%CI are very wide suggesting very imprecise estimates probably because of lack of
power (low sample size).

b) What is the effect of years of number of cigarettes smoked on risk death from lung
cancer?

42
From the results, we can see smoking one and above cigarette compared to not smoking, were
associated with higher rate of lung cancer mortality.

Would the effect of number of cigarettes smoked and years of smoking change if run a
multivariable model with both exposure variables? Would the model performance include?

43
From the results above, we see the model performance based on the AIC (Akaike information
criterion) has improved (the AIC has dropped from 444.08 in model 1 above to 201.31 in this
model).

44
We see from the multivariable model, smoking 1 to 9 cigarettes is not associated with lung
cancer mortality any more. The IDR for years smoked are adjusted for number of cigarettes
smoked and vice versus.

Survival analysis
In logistic regression, we were interested in studying how risk factors were associated with
presence or absence of disease. Sometimes, though, we are interested in how a risk factor or
treatment affects time to disease or some other event. Or we may have study dropout, and therefore
subjects who we are not sure if they had disease or not. In these cases, logistic
regression is not appropriate.

Survival analysis is used to analyze data in which the time until the event is of interest. The
response is often referred to as a failure time, survival time, or event time. Examples: time to death
after diagnosis of cancer or HIV, time to relapse after malnutrition treatment, time to recovery
from malaria etc.

Survival analysis is used to analyze the rates of occurrence of events over time, without
assuming the rates are constant. Generally, survival analysis allows for

• modeling the time until an event occurs, or


45
• compare the time-to-event between different groups, or
• assess how time-to-event correlates with quantitative variables.

The survival time response is usually continuous, but May be incompletely determined for some
subjects – i.e.- For some subjects we may know that their survival time was at least equal to
sometime t. Whereas, for other subjects, we will know their exact time of event. Incompletely
observed responses are censored. The survival time Is always ≥ 0.

Censoring is present when we have some information about a subject’s event time, but we don’t
know the exact event time. For the analysis methods we will discuss to be valid, censoring
mechanism must be independent of the survival mechanism.
There are generally three reasons why censoring might occur:
• A subject does not experience the event before the study ends.
• A person is lost to follow-up during the study period
• A person withdraws from the study
These are all examples of right-censoring.
Right censoring occurs when a subject leaves the study before an event occurs, or the study
ends before the event has occurred. For example, we consider patients in a clinical trial to study
the effect of treatments on stroke occurrence. The study ends after 5 years. Those patients who
have had no strokes by the end of the year are right censored.
Left censoring is when the event of interest has already occurred before enrolment. This is very
rarely encountered.

46
Regardless of the type of censoring, we must assume that it is non-informative about the event;
that is, the censoring is caused by something other than the impending failure.

Hazard and Cumulative Hazard

Survival Function

Kaplan-Meier Curve

Kaplan–Meier method is a nonparametric method for survival analysis. It assumes no specific


distribution of survival times and does not assume a relationship between survival times and
independent variables.

47
Kaplan–Meier method estimates the survival probability from the observed survival times (both
censored and uncensored). Survival probability is plotted against time t in Kaplan–Meier
survival curve. The survival curve is useful for understanding the median survival time (time at
which survival probability is 50%).

Kaplan–Meier method is suitable for simple survival analysis and does not consider other
independent variables (confounding factors) while analyzing survival curves. If there are other
confounding factors that you want to include in model, you should use Cox proportional hazards
(PH) model (Cox regression).

Proportional Hazards Assumption

One of the objectives in survival analysis is to compare the survival functions in different groups,
e.g., leukemia patients as compared to cancer-free controls. If both groups were followed until
everyone died, both survival curves would end at 0%. However, one group might have survived
on average a lot longer than the other group. Survival analysis addresses this problem by
comparing the hazard at different times over the observation period. Proportional hazards
assumption says that the ratio of hazards between groups is constant over time.

Cox Proportional Hazards Model

Cox proportional hazards (CPH) model is a semiparametric model. It analyzes multiple


independent variables for estimating differences between the survival curves. Independent
variables can include the variable of interest (e.g. treatments) and other potential confounders
(e.g. age of the patients).

CPH model uses the hazard function instead of survival probabilities or survival time. The
hazard function is a measure of effect in CPH model.

48
Note that the Cox Proportional regression model makes two assumptions:
a) Non-informative censoring i.e the censoring is independent of the event.
b) Proportion hazard i.e constant HR over the time.
c) There should be a linear relationship between the log of hazard ratio and independent
variables
d) The independent variables should be independent of survival times i.e. independent
variables should not change with time

Hazard ratio (HR) is similar to relative risk and is described as the ratio of hazard rate or failure
rate in two treatment groups (e.g. treated vs. control group). The HR = 1 indicates that there are
no differences between the two groups. HR > 1 indicates that the event of interest is most likely
to occur and vice versa.

49
Log-rank test
Log-rank test is a nonparametric hypothesis test, which compares two or more survival curves
(i.e. survival times for two different condition groups).

The hypothesis for Log-rank test is given as,


Null hypothesis: there are no differences in the survival curves between group1 and group2
Alternate hypothesis: there are differences in the survival curves between group1 and group2

Log-rank test calculates the test statistics by comparing the observed number of events to the
expected number of events in underlying condition groups,

Log-rank test value is compared against the critical value from χ2 distribution with n-1 degree of
freedoms. The drawback of Log-rank test is that it does not analyze other independent variables
affecting the survival time.

Lets fit a survival analysis model in R using NCCTG Lung Cancer Data.

50
The survfit function fits the survival model and returns the number of event, median follow-up
time (95%CI). Of the 228cancer patient, 165 died after median follow-up of 310 (95%CI 285 to
363) days.

We also get time when the actual death occurred;

51
We see the first death occurred after 5 days, and the survival function at day 5 was 0.9956. The
next death occurred on day 11 (3 deaths) and the survival function was 0.9825.

Next we ask the question: was the survival time similar between males vs female (sex)?
a) We will use the log-rank test to compare the survival times between the two groups

From the log-rank test (P=0.001), it is clear the survival time was different. The null hypothesis
tested is that the survival times are not different between the two groups. We also plot the
Kaplan-Meier Curve to demonstrate this.

52
What was the effect of sex on hazard of cancer death?

Female had 41% lower hazard of cancer death (Hazard ratio of 0.588 (95%CI 0.42 to 0.82),
P=0.001.

We can also test if the proportional hazard assumption was met using Schoenfeld residuals.

The evidence against the proportional hazards assumption is very weak.

Here is a multivariable cox proportion model

53
The is no evidence against the proportional hazard assumption.

Regression techniques for correlated outcomes


All the regression models we have covered so far assume:
Independence: All observations are independent of each other, residuals are uncorrelated.

If your data points are correlated, this assumption of independence is violated. Fortunately, there
are still ways to produce a valid regression model with correlated data.

Correlated Data

Correlation in data occurs primarily through multiple measurements (e.g. two measurements are
taken on each participant 1 week apart, and data points within individuals are not independent) or
if there is clustering in the data (e.g. a survey is conducted among students attending different
schools, and data points from students within a given school are not independent).

54
The result is that that the outcome has been measured on the level of an individual observation,
but that there is a second level of either an individual (in the case of multiple time points) or
clusters on which individual data points can be correlated. Ignoring this correlation means that
standard error cannot be accurately computed, and in most cases will be artificially low.

Individuals in clusters are more similar? Some reasons are:


• Individuals in a community tend to behave or respond more like other people in the same
community than others in a different community.
• Individuals may have a level of exposure more like others in the same community than
individuals in a different community.
• An infected individual is more likely to transmit their infection to an individual in the same
community than to an individual in a different community.

The standard regression models, compute Standard error (SE) using the number of individuals in

a study

The correlation induced for whatever reason between individuals in a community can be
measured by the intra-cluster correlation, ρ('rho') (also called within-cluster correlation).
ρ=0 means that responses of individuals within the same cluster are no more alike than
those of individuals from different clusters.
Ρ=1 means that all responses of individuals within the same cluster are identical.
An alternative way of thinking about intra-cluster correlation is in terms of between-
cluster variation.

As previously mentioned, simple regression will produce inaccurate standard errors with
correlated data and therefore should not be used.

Instead, you want to use models that can account for the correlation that is present in your data.
If the correlation is due to some grouping variable (e.g. school) or repeated measures over time,
then you can choose between Generalized Estimating Equations or Multilevel Models. These
modeling techniques can handle either binary or continuous outcome variables, so can be used to
replace either logistic or linear regression when the data are correlated.

Below are some of the methods for correcting the correlation.

Robust standard errors


One useful approach to derive standard errors that allow for the clustering is to use what
are called robust standard errors.
While model-based standard errors are based on predicted variability, robust standard errors are
based on observed variability. When error terms are correlated within clusters but independent
across clusters, then regular standard errors, which assume independence between all
observations, will be incorrect. Cluster-robust standard errors are designed to allow for
correlation between observations within cluster. Using robust standard errors, you can then
obtain appropriate confidence intervals and P-values.
Robust standard errors are based on the sum of the residuals.

55
Variance
The ri terms are the residuals.
The residuals are the difference between the outcome observed and the outcome predicted
by the model. When observations are independent, then the summation is performed on the
individual-level residuals. If data are "clustered", then cluster-level residuals are calculated
and summed over the clusters.
Note: This does not make any assumptions about independence within clusters but does
assume that there is independence between clusters.

Generalised estimating equations


One weakness of the robust standard error approach is that it ignores clustering when calculating
the effect estimates (e.g. the odds ratio) – it is only the standard errors that are adjusted. This means
that, for the calculation of the effect estimate, the same weight is given to an individual in a
household with many individuals as an individual who is the only contact in a household.
Generalised estimating equations (GEE) use robust standard errors, but also take account of
correlations when estimating the measure of effect, e.g. the odds ratio. Therefore, this method
gives different weights to individuals, depending on how many individuals are in the household.
When using GEEs, you must think about how the observations in a data set are likely to be
correlated with each other. The three standard options for this are given opposite.

Independence
This choice implies that you don't think the data are correlated. If you don't think the data are
correlated, you probably don't need to be using GEE.
Exchangeable
This choice implies that within a "cluster", e.g. a household, any two observations are equally
correlated, but that there is no correlation between observations from different "clusters". This is
a common choice.
Autocorrelation
This choice is useful for measures repeated over time, e.g. repeated measurements on the same
individual such as episodes of diarrhoea. Repeated measurements on an individual are most likely
to be most strongly correlated when they are made a short time apart. The greater the time interval
between two measurements the smaller the correlation is likely to be.
Summary
The main aspects of GEE analysis are:
1. GEE can include robust standard errors.
2. You need to specify how you think the data are correlated. The usual choice is 'exchangeable'.

56
3. If an exchangeable correlation is specified, point estimates, e.g., odds ratio, rate ratio, are
adjusted for correlations in the data.

Random effects models


Robust standard errors and generalised estimating equations are two practical approaches to
dealing with correlated observations. However, they are not based on a full (probability) model
for the data.
Therefore statisticians usually prefer to use another approach. The third approach is to use
random effects models, also known as multilevel models.
Random effects models include the variation between clusters explicitly in the likelihood and
therefore take account of intra-cluster correlations.
Summary
1. A random effects model specifies the form of the between-cluster variation and includes it in
the likelihood.
2. The point estimates, standard errors and log-likelihood obtained from a random effects model
all take account of the clustering (assuming that the random effects distribution is correctly
specified).
3. Likelihood ratio tests are valid.
4. Estimates of the between cluster variation and intra-cluster correlation are obtained.
5. There needs to be a reasonable number of "clusters" in the dataset for the method to be
reliable.
6. When performing random effects logistic regression analysis, the reliability of the estimates
should be checked, especially when ρ is large.

Example

We are going to use a case-control study conducted in nine sites (clusters). The outcome of
interest is being a case (dead) or control (alive). There are many exposure variables that we need
to find out if they are associated with being a case adjusting for the clustering in each site. We
assume that patients from each site are different from other sites. So if we ignoring this
clustering, we will get invalid measure of effect (odds ratios). We will first run a glm logistic
regression model ignoring the clustering, then adjust the clustering using robust standard errors.
Later we will run the gee and the multilevel (rand effect) model. Let start by exploring the data;

57
On the column we have the sites (clusters) and on the rows we have some of the exposures and
the outcome (adm_dead).

Let start by running a multivariable logistic regression ignoring the clustering within the sites.

From the output, we see urban, abc and hiv exposures were significantly associated with being a
case (P-values<0.05 with *). Can you go ahead and estimate the odds ratios plus 95%CI?

58
For the exposure variable urban, the Odds ratio is 0.4292 (exp(-0.8459) and the 95%CI are from
0.2683 to 0.6865 as shown above.

Let now run the same model but use the cluster robust standard errors

Note the estimated coefficients don’t change but the standard errors (now we have robust
standard errors corrected at each site), the z-value has changed and the P-values have changed
too because of the changes on std.error (z value=std.error/coefficients value). Note also now the
variable urban is now not significantly associated with being a case.

We will now run the same model but now use the gee model to correct for the clustering

59
Look at the results, now only HIV is associated with being a case, the coefficients and std.err are
different.

Lastly, we run the random effect model

The random effect model explicitly models the clustering and provides the variance across the
sites (0.582). This is the most robust method. I recommend using this approach.

Notice it is the abc and HIV variables that are significantly associated with being a case. Below
is summary of the regression coefficients plus std.erros for all the models

Exposure Naïve model Model with robust std Gee models Random effect model
variable errors
Coefficients Std.error Coefficients Std.error Coefficients Std.error Coefficients Std.error

60
Intercept -0.24 0.5488 -0.24 0.6959 0.6348 0.2585* -0.2748 0.7010
Sex -0.1115 0.2363 -0.1115 0.1854 -0.0712 0.1162 -0.1919 0.2504
Age -0.036 0.0225 -0.036 0.0232 -0.0011 0.0007 -0.0454 0.0241
Urban -0.8459 0.2397* -0.8459 0.5060 -0.0819 0.07455 -0.6699 0.5788
Abc_level1 0.7159 0.3520* 0.7159 0.3615* 0.0019 0.0718 0.7599 0.3707*
Abc_level2 1.6955 0.3008* 1.6955 0.3197* 0.00219 0.11899 1.732 0.3184*
HIV_level1 1.7129 0.5320* 1.7129 0.3807* 0.3317 0.1033* 1.4673 0.5519*
HIV_level2 0.8712 0.4078* 0.8712 0.3695* 0.13899 0.1077 0.533 0.4423
HIV_level9 -0.3785 0.8254 -0.3785 0.9445 -0.7504 0.249* -0.4927 0.8581

Meta-analysis
It is now common for important clinical questions in medical research to be addressed in several
studies. This can be confusing for a medical practitioner. Which studies' results should be
followed? How can the information from the mass of data published be summarised? Meta-
analysis is a quantitative tool that can be used to summarise information from many studies. An
informal literature review can be too subjective and misleading, whereas meta-analysis can assist
with an overall conclusion. This combines results from different studies to give an overall
summary estimate (and confidence interval). The concept of meta-analysis is fairly easy to
understand. The purpose of meta-analysis is to summarise information from different studies.

Definition of meta-analysis (from Glass, 1976): The statistical analysis of a large collection of
analysis results for the purpose of integrating the findings.

Meta-analysis provides the highest level of evidence in medical research as shown below.

61
An important step in a systematic review is the thoughtful consideration of whether it is appropriate
to combine the numerical results of all, or perhaps some, of the studies. Such a meta-analysis
yields an overall statistic (together with its confidence interval) that summarizes the effectiveness
of an experimental intervention compared with a comparator intervention. Potential advantages of
meta-analyses include the following:

1. To improve precision. Many studies are too small to provide convincing evidence about
intervention effects in isolation. Estimation is usually improved when it is based on more
information.
2. To answer questions not posed by the individual studies. Primary studies often involve a
specific type of participant and explicitly defined interventions. A selection of studies in
which these characteristics differ can allow investigation of the consistency of effect
across a wider range of populations and interventions. It may also, if relevant, allow
reasons for differences in effect estimates to be investigated.
3. To settle controversies arising from apparently conflicting studies or to generate new
hypotheses. Statistical synthesis of findings allows the degree of conflict to be formally
assessed, and reasons for different results to be explored and quantified.

Of course, the use of statistical synthesis methods does not guarantee that the results of a review
are valid, any more than it does for a primary study. Moreover, like any tool, statistical methods
can be misused.

Principles of meta-analysis

The commonly used methods for meta-analysis follow the following basic principles:

1. Meta-analysis is typically a two-stage process. In the first stage, a summary statistic is


calculated for each study, to describe the observed intervention effect in the same way for
every study. For example, the summary statistic may be a risk ratio if the data are
dichotomous, or a difference between means if the data are continuous.
2. In the second stage, a summary (combined) intervention effect estimate is calculated as a
weighted average of the intervention effects estimated in the individual studies. A
weighted average is defined as

where Yi is the intervention effect estimated in the ith study, Wi is the weight given to the
ith study, and the summation is across all studies. Note that if all the weights are the same
then the weighted average is equal to the mean intervention effect. The bigger the weight
given to the ith study, the more it will contribute to the weighted average.

62
3. The combination of intervention effect estimates across studies may optionally
incorporate an assumption that the studies are not all estimating the same intervention
effect, but estimate intervention effects that follow a distribution across studies. This is
the basis of a random-effects meta-analysis. Alternatively, if it is assumed that each study
is estimating exactly the same quantity, then a fixed-effect meta-analysis is performed.
4. The standard error of the summary intervention effect can be used to derive a confidence
interval, which communicates the precision (or uncertainty) of the summary estimate; and
to derive a P value, which communicates the strength of the evidence against the null
hypothesis of no intervention effect.
5. As well as yielding a summary quantification of the intervention effect, all methods of
meta-analysis can incorporate an assessment of whether the variation among the results
of the separate studies is compatible with random variation, or whether it is large enough
to indicate inconsistency of intervention effects across studies.
6. The problem of missing data is one of the numerous practical considerations that must be
thought through when undertaking a meta-analysis. In particular, review authors should
consider the implications of missing outcome data from individual participants (due to
losses to follow-up or exclusions from analysis).

Meta-analyses are usually illustrated using a forest plot.


Inverse-variance approach to meta-analysis

A very common and simple version of the meta-analysis procedure is commonly referred to as
the inverse-variance method. This approach is implemented in its most basic form in RevMan,
and is used behind the scenes in many meta-analyses of both dichotomous and continuous data.

The inverse-variance method is so named because the weight given to each study is chosen to be
the inverse of the variance of the effect estimate (i.e. 1 over the square of its standard error).
Thus, larger studies, which have smaller standard errors, are given more weight than smaller
studies, which have larger standard errors. This choice of weights minimizes the imprecision
(uncertainty) of the pooled effect estimate.

There are two main approaches to meta-analysis:


1. Fixed effects method
2. Random effects method
The latter helps to deal with heterogeneity between studies.

Fixed-effect method for meta-analysis


This is the simplest method for calculating a summary estimate. However, this method makes the
assumption that each study is measuring the same true effect, e.g. odds ratio, rate ratio.
An overall estimate assumes the effect is the same in each strata.
The fixed-effects method:
• Assumes the true effect is the same in all studies
• Gives a weight to each individual study estimate

63
• Calculates a summary estimate by calculating a weighted average of the individual study
estimates.
So, we obtain a weighted average of the study estimates.

A fixed-effect meta-analysis using the inverse-variance method calculates a weighted average as:

where Yi is the intervention effect estimated in the ith study, SEi is the standard error of that
estimate, and the summation is across all studies. The basic data required for the analysis are
therefore an estimate of the intervention effect and its standard error from each study. A fixed-
effect meta-analysis is valid under an assumption that all effect estimates are estimating the same
underlying intervention effect, which is referred to variously as a ‘fixed-effect’ assumption, a
‘common-effect’ assumption or an ‘equal-effects’ assumption. However, the result of the meta-
analysis can be interpreted without making such an assumption.

Tests for heterogeneity


The fixed-effects method is based on the assumption that the true effect does not differ between
studies. This assumption should be checked, and if there is a difference then a random effects
method should be used to obtain a summary estimate.
The test for heterogeneity is based on the distance between the individual study estimates and the
summary estimate from the fixed-effects method.

This is large if the average distance between the individual study effects and the summary effect
is large.
This statistic is referred to the χ² distribution,
• if statistically significant then there is evidence against the null hypothesis of a common effect
for all studies.
• if not statistically significant, there is no evidence for a heterogenous effect across trials. ie we
would conclude that there is a common effect (homogeneity) across trials.
You do not have to do this calculation by hand – statistics packages that perform meta-analysis
will include a test for heterogeneity

Random-effects methods for meta-analysis


If the effects shown in each study differ, then a random effects method should be used to obtain
a summary estimate. The interpretation of the summary estimate is that it is a mean effect about
which it is assumed that the true study effects vary
In a random effects model:
• the true effects are allowed to differ, i.e. it allows for interaction
• the true effects vary randomly about the population average
• the between-study variance is estimated

64
• the variance is used to modify the study weights
The formula for the random-effects summary estimate is similar to that for the fixed-effects
summary estimate. The difference is the weighting. The weights include a between study
variance. This is complex and is not discussed here, you only need to know that this variance is
taken into account and that is how the random effects method differs from the fixed effects
method.

R indicates random effects estimate.


Most statistical packages will perform a random-effects meta-analysis.
In summary:
Fixed effects
• assumes the true effect is the same in each study
• does not allow for heterogeneity
• produces a narrow confidence interval

Random effects
• assumes the true effect differs between studies
• the true effects vary randomly about the population 'average'
• the between-study variance needs to be estimated
• weights for each trial incorporate the between-study variance

Examples
Let use data from this study:

65
To conduct meta-analysis in R we need the package: metafor

Get help with metafor using the code: help("metafor")

66
We start by fixing a fixed effect meta-analysis model (assuming no heterogeneity).

The I2 tells use the % of heterogeneity, in this case it was 0% meaning there was no variability
across the studies.
There is a formal statistical test for heterogeneity using chi-square test, in this case it was 1.24
and the P=0.74. The null hypothesis test is that there no heterogeneity, since the P-value=0.74,
we have no evidence to reject the null hypothesis we therefore conclude there is no
heterogeneity. Therefore, the assumption no heterogeneity by fixe-effect method holds. Then we
can use the fixed effect method in this case.
From the results above, the pooled effect is -0.16 (this is in log format). To get the effect as risk
ratio, we exponentiate this value

We can also write a code to get the 95%CI for the pooled estimate

The pooled risk ratio is 0.85, 95%CI 0.79 to 0.92.


This can be displayed on the forest plot to visualize the results

67
The weights displayed are calculated using the inverse-variance method discussed earlier. Studies
that have very low variances (likely large sample sizes and very precise) have the largest weight
i.e like in this case the IDEAL study with 37% weight. A forest plot displays effect estimates and
confidence intervals for both individual studies and meta-analyses. Each study is represented by a
block at the point estimate of intervention effect with a horizontal line extending either side of the
block. The area of the block indicates the weight assigned to that study in the meta-analysis while
the horizontal line depicts the confidence interval (usually with a 95% level of confidence). The
area of the block and the confidence interval convey similar information, but both make different
contributions to the graphic. The confidence interval depicts the range of intervention effects
compatible with the study’s result. The size of the block draws the eye towards the studies with
larger weight (usually those with narrower confidence intervals), which dominate the calculation
of the summary result, presented as a diamond at the bottom.
Note when there is no evidence of heterogeneity (P≥0.05), both fixed effect and random effect
methods yields the same pooled measure (Use the fixed effect). When the test of heterogeneity P-
value <0.05, there is evidence of heterogeneity, use the random effect method.

In the above example since test of heterogeneity P≥0.05, you should use fixed effect method,
but the random effect will yield similar results as shown below.

68
Sample size and power, calculations
It is important to ensure at the design stage that the proposed number of subjects to be recruited
into any study will be appropriate to answer the main objective(s) of the study. A small study
may fail to detect important effects on the outcomes of interest, or may estimate them too
imprecisely, no matter how good its design may be in other respects. A study larger than
necessary, while less common in practice, may waste valuable resources.

The sample size calculation is generally required at the study design stage, before patient
enrolment has begun. There are several reasons for this.
• Firstly, from a scientific perspective, testing too few might lead to failure to detect an
important effect, whereas testing too many might lead to detecting a statistically significant
yet clinically insignificant effect.
• Secondly, from an ethical viewpoint, testing too many subjects can lead to unnecessary
harm or potentially unnecessary sacrifice in the case of animal studies. Conversely, testing
too few is also unethical, as an underpowered study might not contribute to the evidence-
based field of medicine.
• Thirdly, from an economical perspective, testing too many will lead to unnecessary costs
and testing too few will be potentially wasteful if the trial is unable to address the scientific
question of interest.
For this reason, many funders and institutional review boards require an a priori sample size
calculation, which is included in the study protocol. Adaptive trial designs, whereby prespecified

69
modifications can be made to the trial after its inception, can potentially improve flexibility and
efficiency.

Components
There are four principal components required to calculate the sample size (Table below). These
components are specified via parameters. Working under a hypothesis testing framework, we
assume a null hypothesis (H0) and an alternative hypothesis (H1). In practice, we do not know the
‘truth’, so we base our inferences on a statistical test applied to a random sample from the
population.
Two types of error can occur. The first is a Type I error, where the null hypothesis is true, but we
incorrectly reject it. The second is a Type II error, where the null hypothesis is false, but we
incorrectly fail to reject it. Specification of the Type I (denoted as a) and Type II (denoted as b but
more commonly reported as the complement: the power = 1- b) error rate parameters are required
for the sample size calculation. Conventional choices are α = 0.05 and 0.01 (corresponding to a
significance level of 5% and 1%, respectively) and b = 0.2 and 0.1 (corresponding to 80% and
90% power, respectively). However, there are situations in which these parameters might be
increased or decreased, depending on the clinical context of the study.

The minimal clinically relevant difference is the smallest difference in outcome between the study
groups, or effect size, that is of scientific interest to the investigator.

Power Analysis in R

70
We demonstrate how to estimate sample size for two proportions
Example

Suppose we want to randomly sample male and female college undergraduate students and ask
them if they consume alcohol at least once a week. Our null hypothesis is no difference in the
proportion that answer yes. Our alternative hypothesis is that there is a difference. This is a two-
sided alternative; one gender has higher proportion but we don't know which. We would like to
detect a difference as small as 5%. How many students do we need to sample in each group if we
want 80% power and a significance level of 0.05?

Solution

The study power is 80%, the α=5%, and the effect size is 5% (p1=0.5, P2=0.55)

We will require a N of 1565 in each group.

Example 2
Suppose in the above example, we already know the proportion of male=10% and female=5%.
What would be the required sample size?
Solution

71
Comparing means between groups

Example
Suppose a certain food supplement is being tested on children with severe acute malnutrition. The
researcher hypothesed the food supplement would change the weight-for-length z score from a
mean of -3.2 to -2.7 (an improvement of 0.5 z scores). Assuming a common standard deviation of
1.5, power of 80% and alpha of 5%, what would be the required sample size?

Solution
Delta=0.5 (change from -3.2 to -2.7)
Common sd=1.5, Power=0.8, α=0.05

Therefore, the researcher will require a total of 286 children (~143 in each group).

72

You might also like