Value in Health - 2002 - Schechtman - Odds Ratio Relative Risk Absolute Risk Reduction and The Number Needed To Treat

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Volume 5 • Number 5 • 2002

V A L U E I N H E A LT H

Odds Ratio, Relative Risk, Absolute Risk Reduction, and the


Number Needed to Treat—Which of These Should We Use?

Edna Schechtman, PhD


Department of Industrial Engineering and Management, Ben Gurion University of the Negev, Beer Sheva, Israel

ABSTRACT

Introduction: Statistical analyses of data and making Methods: The purpose of this paper is to illustrate, using
sense of medical data have received much attention in the examples, how each measure is used, what it means, and
medical literature, but nevertheless have caused confusion what are its advantages and disadvantages.
among practitioners. Each researcher provides a different Results: Some pairs of measures present equivalent infor-
method for comparing treatments. For example, when the mation. Furthermore, it is shown that different measures
end point is binary, such as disease versus no disease, the result in different impressions.
common measures are odds ratios, relative risk, relative Conclusion: It is recommended that researchers report
risk reduction, absolute risk reduction, and the number both a relative and an absolute measure and present these
needed to treat. The question faced by the practitioner is with appropriate confidence intervals.
then: Which one will help me in choosing the best treat- Keywords: odds ratio, risk reduction, number needed to
ment for my patient? treat, medical decision making.

Introduction disease may be different, the studies look at differ-


ent subsets of patients who had previously been
In recent years, the amount of available information
exposed to different drugs, and the outcome crite-
in medical literature has increased rapidly, and as
ria may be different. These problems imply limita-
more studies are performed, the results have
tions to any systematic review of placebo-controlled
become more easily accessible. Even patients in
trials designed for regulatory purposes [2]. Obvi-
the Internet era are aware of current research. The
ously, the best way to compare several treatments
problem is how to judge the evidence in various
is to design a study that will include all treatments
published studies and decide whether it justifies
to be compared, but that is a hard task to accom-
changing the existing treatment for a new one. “At
plish for regulatory purposes.
the mention of the term ‘statistics’, most physicians
In the past few years, the issue of integrating the
react with a groan of confusion and annoyance.”
results of several independent studies has been the
[1]
topic of many articles. While some suggest using
The main difficulty in the comparison of differ-
only relative risk [3], or absolute risk reduction [4],
ent treatments lies in the fact that they are almost
others advocate use of the number needed to treat
never compared, in a preplanned study, against each
criteria [5,6], and some consider the odds ratio to
other. Instead, most studies compare the new treat-
be the method of choice [2]. Obviously, the choice
ment with a placebo. Moreover, the end points of
of method is linked to the type of study and its
the studies may differ, the initial severity of the
design. For retrospective studies and for cross-
sectional studies, in which the aim is to look at the
association rather than differences, the odds ratio is
recommended, while a relative risk or risk differ-
ence cannot be meaningfully calculated. Risk cal-
Address correspondence to: Edna Schechtman, PhD,
Statistician, Department of Industrial Engineering and culations are only meaningful in follow-up studies.
Management, Ben Gurion University of the Negev, Beer Odds ratio is also used in case-control studies, in
Sheva, Israel. E-mail: [email protected] which the relative risk cannot be estimated.

© ISPOR 1098-3015/02/$15.00/431 431– 436 431


15244733, 2002, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1046/j.1524-4733.2002.55150.x by Readcube (Labtiva Inc.), Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
432 Schechtman

Methods can be obtained by inverting the upper and lower


confidence limits of absolute risk reduction. The
In the present article, we are mainly concerned with
NNT has both advantages and disadvantages that
controlled studies. We will describe the different
are discussed in the medical literature. It can be
measures of treatment effects along with their
easily understood and used and “[ . . . ] should
advantages and disadvantages and summarize some
help us to make the best clinical decisions with
of the debates regarding which one is to be used, as
our patients.” Elferink and Van Zwieten-Boot [6]
reported in the medical literature in recent years.
encourage the use of NNT and state that NNT
Because the choice of treatment depends on the
takes into account the absolute benefit and is a
measure being used, it is important that the practi-
meaningful measure because it addresses both
tioner and the patient understand the differences
statistical and clinical significance in a way that is
between the measures. We hope that this under-
easily interpreted. It is worth noting that the numer-
standing will help in choosing the proper measure
ical value of NNT is a function of the disease, the
for the case and recommend that both a relative and
intervention, and the outcome [5]. A NNT of 10
an absolute measure be reported to give a more
when the outcome is very serious may be judged
complete picture.
differently than a NNT of 5 for a milder outcome.
Therefore, it is only appropriate to compare NNTs
Common Measures directly, when treatments for the same condition,
Absolute risk reduction. The basic and simplest severity, and outcome are compared.
measure is the absolute risk reduction (ARR), also When there is no difference in risk between the
called the risk difference. That is, as a result of using treatment and control, the absolute risk reduction
the treatment, is the risk of an event reduced by is zero and NNT is infinite. Also, when the differ-
a clinically meaningful amount? The calculation is ence is not significant, the CI for absolute risk
just the difference between the risk of an event in reduction will include zero. Because a CI for NNT
the control group and the risk of an event in the is obtained by taking reciprocals of the CI for ARR,
treated group. we may get an ARR of 0.1, with a 95% CI of -0.05
The advantage of the estimated ARR is that it is to 0.25, which yields a NNT of 10 and a 95% CI
easy to compute, the confidence interval obtained is of -20 to 4. There are two problems with this inter-
easy to interpret (and is readily available with stan- val. First, NNT should be positive, and second,
dard statistical packages), it reflects both the under- the CI does not include the point estimator. NNT is
lying risk without treatment and the risk reduction equal to 10 in this case [8], for which McQuay
associated with treatment, and has a clear meaning, and Moore suggest using only point estimates [5].
which makes it appealing to the practitioner. A con- However, it is not satisfactory for a CI to be
fidence interval that contains zero means that there presented only when the result is significant [8].
is no significant difference between the treatment The interpretation of a negative value for NNT is
and the placebo in terms of risk. One disadvantage as follows: if NNT patients are treated with the new
is that a difference in risk of fixed size may have treatment, one fewer patient will benefit than if they
greater importance when the risks are close to 0 or were all treated with the control. When NNT is neg-
1 than when they are near the middle of the range. ative, it is called NNH—the number needed to
A difference between 0.010 and 0.001, when harm. As ARR approaches zero, it means that there
considering the risk that people suffer serious side is almost no difference between the new treatment
effects, is more noteworthy than the difference and the control, and therefore, infinitely many
between 0.410 and 0.401 [7]. patients need to be treated for one to get well, who
otherwise would not have. The problem of inter-
Number needed to treat. A related measure, based preting a CI such as (95% CI, -20 to 4) still exists,
on the absolute risk reduction, is the number needed because ARR of zero translates into NNT equal to
to treat (NNT), which is defined as the reciprocal infinity. One simple solution is to report two sepa-
of the absolute risk reduction. The meaning of this rate intervals: NNH (20–•) and NNT (4–•).
measure is the number of patients that need to be Altman [8] proposes combining both intervals into
treated, to get the desired outcome in one patient one statement: NNTH 20 to • to NNTB 4.
who would not have benefited otherwise. Also, To overcome the disadvantages, it has been sug-
when the outcome is binary, the cost-effectiveness gested that NNT be accompanied by the control
ratio becomes the product of the incremental costs group event rate to which they apply and the rela-
and the NNT. A confidence interval (CI) for NNT tive risk and CI from which they are derived [3].
15244733, 2002, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1046/j.1524-4733.2002.55150.x by Readcube (Labtiva Inc.), Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Reporting Odds, Risk, and Number Needed to Treat 433

Newcombe [4] suggests that absolute risk reduction ratio is the ratio between the odds of the treated
is a more basic quantity, with much less potential group and the odds of the control group. It can be
to be misunderstood and is preferable to the NNT, obtained, along with its confidence interval, using
because of the NNT’s singularity problem. He sug- standard statistical software. Both odds and odds
gests that NNT and its CI will be used in an alter- ratios are dimensionless. An odds ratio less than 1
native way, when the absolute risk reduction is well means that the odds have decreased, and similarly,
away from zero. an OR greater than 1 means that the odds have
increased. It should be noted that ORs are hard to
Relative risk and relative risk reduction. The next comprehend [13] and are frequently interpreted as
two popular measures are relative risk (RR) and rel- a relative risk. Although the odds ratio is close to
ative risk reduction (RRR). The relative risk of a the relative risk when the outcome is relatively
treatment is the ratio of risks of the treated group uncommon [12], there is a recognized problem that
and the control group, also called the risk ratio. The odds ratios do not give a good approximation of the
relative risk reduction is derived from the relative relative risk when the initial risk is high [13,14]. Fur-
risk by subtracting it from one, which is the same thermore, an odds ratio will always exaggerate the
as the ratio between the ARR and the risk in the size of the effect compared to a relative risk [15,16].
control group. When the OR is less than 1, it is smaller than the
RR is easy to compute and interpret and is RR, and when it is greater than 1, the OR exceeds
included in standard statistical software. The CI is the RR. However, the interpretation will not,
calculated by exponentiating the lower and upper generally, be influenced by this discrepancy, because
limits of the CI for log(RR), which has the general the discrepancy is large only for large positive or
form negative effect size, in which case the qualitative
conclusion will remain unchanged.
CI = log(RR) ± 1.96 ¥ SE (log(RR) ). (1)
It is worthwhile to note that RR and OR are
However, the simple method for calculating the related as follows:
CI does not perform well [9], and better methods
such as EquivTest [10] and CIA [11] can be used, RR = OR* (1 + (n21 n22) ) (1 + (n11 n12) ) , (2)
although they are not yet widely available. where n11 is the frequency of (yes, group 1); n21 is
One disadvantage of RR is that its value can be the frequency of (yes, group 2); n22 is the frequency
the same for very different clinical situations. For of (no, group 2); and n12 is the frequency of (no,
example, a RR of 0.167 would be the outcome for group 1).
both of the following clinical situations: 1) when the This formula explains why OR approximates
risks for the treated and control groups are 0.3 and RR well when n11 and n21, the frequencies of the
0.05, respectively; and for 2) a risk of 0.84 for the “yes” outcome, are small relative to n12 and n22,
treated group and of 0.14 for the control group. RR respectively. This is known as the “rare outcome
is clear on a proportional scale, but has no real assumption.”
meaning on an absolute scale. Therefore, it is gen- The odds ratio is the only measure of association
erally more meaningful to use relative effect mea- directly estimated from a logistic model, without
sures for summarizing the evidence and absolute requiring special assumptions and regardless of
measures for application to a concrete clinical or whether the study design is follow-up, case-control,
public health situation [12]. or cross sectional [17]. Risks can be estimated only
in follow-up designs. In case-control and cross-
Odds ratio. Odds ratio (OR) is a common measure sectional designs, the OR is a ratio, which depends
of the size of an effect and may be reported in case- on four probabilities as follows:
control studies, cohort studies, or clinical trials. It
can also be used in retrospective studies and cross- Pˆ (E = 1 D = 1) Pˆ (E = 0 D = 1)
sectional studies, where the goal is to look at asso- OR = , (3)
Pˆ (E = 1 D = 0) Pˆ (E = 0 D = 0)
ciations rather than differences. The odds is the
natural measure of effect size in logistic regression where E = 1 if the patient was exposed, E = 0
modeling and can be interpreted as the ratio otherwise, D = 1 if the patient has the disease, and
between the number of patients who fulfill the cri- D = 0 otherwise. It is worthwhile to note that risk
teria and the number who do not or the number of cannot be estimated from a case-control and cross-
events relative to the number of nonevents. The odds sectional studies because they require conditional
15244733, 2002, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1046/j.1524-4733.2002.55150.x by Readcube (Labtiva Inc.), Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
434 Schechtman

probabilities of the type P̂(D|E), which are not 2. Cases 1 and 4 have the same absolute risk reduc-
available. tion, NNT, and odds ratios, but very different
relative risk, relative risk reduction, and risk at
baseline.
Results
Hypothetical Example Real Example
A hypothetical example, used in part by McQuay The following example [18] is a prospective study,
and Moore [5], will be used to illustrate the differ- which compares the incidences of dyskinesia after
ent measures accompanied by their CIs. The study ropinirole (ROP) or levodopa (LD) in patients with
aims to compare the recurrence of migraine early Parkinson’s disease. The results show that 17
headaches in a control group receiving placebo and of 179 patients who took ropinirole and 23 of 89
a treated group receiving a new antimigraine prepa- who took levodopa developed dyskinesia. The data
ration. For the sake of illustration, we examine four are summarized in Table 2.
different possible outcomes for the control and The risk of having dyskinesia among patients
treatment groups, denoted by C1 and M1 for study who took LD is 23/89 = 0.258, whereas the risk
1, C2 and M2 for study 2, C3 and M3 for study 3, of developing dyskinesia among patients who took
and C4 and M4 for study 4. It is assumed that all ROP is 17/179 = 0.095
groups were of 1000 individuals. Therefore, the absolute risk reduction is
At the end of the study, migraine recurred in ARR = 0.258 - 0.095 = 0.163.
30% of control group C1 (risk, 0.3), 5% of treat-
The variance of ARR is given by
ment group M1, 84% of control group C2, 14% of
treatment group M2, 10% of control C3, 1.7% of V (ARR) = 0.258(1 - 0.258) 89 +
treatment group M3, and in 95% and 70% for C4 0.095 (1 - 0.095) 179 = 0.00263. (4)
and M4, respectively, as summarized in Table 1.
The measures used are absolute risk reduction with Therefore, a 95% confidence interval for the dif-
95% CI, risk, number needed to treat with 95% CI, ference in proportions is given by
relative risk with 95% CI, risk reduction, odds, and 0.163 ± 1.96 ¥ 0.00263 = (0.0636 - 0.264),
odds ratio with 95% CI.
It can be seen that: where 1.96 is the upper percentile of 2.5, taken
from a standard normal table for 95% CIs.
1. The first three cases have the same relative risk The number needed to treat and its CI are
and relative risk reduction, while case 4 is sig- obtained from ARR and its CI by taking the recip-
nificantly different. However, the absolute risk rocals as NNT = 1/ARR = 1/0.163 = 6.13, and its
reduction, NNT, and odds ratios are significantly CI is given by (1/0.264 - 1/0.063) = (3.79 - 15.87).
different in the three cases studied. (For odds The relative risk is 0.095/0.258 = 0.368
ratios, case 2 is different from cases 1 and 3, The confidence interval is obtained as follows: a
which are similar.) CI for the log of RR is obtained, and the lower and

Table 1 The basic measures and corresponding 95% CIs for four cases*
C1 M1 C2 M2 C3 M3 C4 M4
Event 300 50 840 140 100 17 950 700
No event 700 950 160 860 900 983 50 300
Risk of event 0.3 0.05 0.84 0.14 0.1 0.017 0.95 0.7
ARR 0.25 0.70 0.083 0.25
CI 0.217–0.283 0.656–0.744 0.062–0.104 0.217–0.283
NNT 4 1.43 12.05 4
CI 3.53–4.60 1.34–1.52 9.65–16.02 3.53–4.60
RR 0.167 0.167 0.17 0.74
CI 0.125–0.222 0.143–0.195 0.102–0.282 0.706–0.769
RRR 0.833 0.833 0.83 0.26
Odds 0.429 0.053 5.25 0.163 0.111 0.017 19 2.33
OR 8.14 0.123 32.25 0.031 6.42 0.156 8.15 0.123
CI 0.090–0.168 0.024–0.04 0.092–0.262 0.090–0.168

*Ci and Mi (i = 1, . . . 4) are the hypothetical control and treated groups, respectively.
Abbreviations: ARR, absolute risk reduction; NNT, number needed to treat.
15244733, 2002, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1046/j.1524-4733.2002.55150.x by Readcube (Labtiva Inc.), Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Reporting Odds, Risk, and Number Needed to Treat 435

Table 2 Frequency of dyskinesia in patients with early is influenced by how the benefit is presented—in rel-
Parkinson’s disease ative or absolute terms. They found that the
Presence of dyskinesia framing of benefit or risk in relative versus absolute
Yes No Total terms may have a major influence on patient
preference. The medication whose benefits were
Levodopa 23 66 89
Ropinirole 17 162 179 expressed in relative terms was chosen by 56.8% of
Totals 40 228 268 patients, whereas 14.7% chose the medication
whose benefit was expressed in absolute terms.

upper limits are transformed to obtain the desired Conclusion


interval. The discussion above and the hypothetical example
The variance of log(RR) is given by were aimed at showing that choice of treatment
V (log(RR) ) = 1 23 - 1 89 + 1 17 - 1 179 = 0.08551. depends on the measure being used. Therefore, it
is important that the practitioner understands
Therefore, a 95% CI for log(RR) is given by what the different measures really express and
which ones may be more appropriate for a specific
Log(0.368) ± 1.96 ¥ 0.08551 patient setting. For example, ARR and NNT are
= (-1.5727 to - 0.4265). absolute measures, whereas RR and RRR are rela-
tive measures. It is recommended that both a rela-
Exponentiating the lower and upper confidence
tive and an absolute measure be reported, to
limits provides the 95% CI for RR: (0.207–0.653).
portray a more complete picture.
The odds of having dyskinesia for LD patients is
23/66 = 0.348. The odds of having dyskinesia for The author thanks Dr Rivka Inzelberg whose valuable
ROP patients is 17/162 = 0.105, and therefore the comments helped improve this paper.
odds ratio OR is 0.105/0.348 = 0.302.
The procedure for obtaining a confidence inter- References
val is as follows: a CI for the log of OR is obtained, 1 Wolf JS, Smith DS. Practical biomedical statistics.
and the lower and upper limits are then transformed a guide to the selection of statistical tests. Urology
to obtain the desired interval. 1996;47:2–12.
The variance of log(OR) is given by 2 Marson AG, Kadir ZA, Chadwick DW. New
antiepileptic drugs: a systematic review of their
V (log(OR) ) = (1 23 + 1 66 + 1 17 + 1 162) = 0.1236. efficacy and tolerability. BMJ 1996;313:1169–74.
Therefore, a 95% CI for log(OR) is given by 3 Cates C. Pooling numbers needed to treat may not
be reliable. BMJ 1999;318:1764.
Log(OR) ± 1.96 ¥ 0.1236 4 Newcombe RG. Confidence intervals for the
number needed to treat: absolute risk reduction is
= -1.198 ± 1.96 ¥ 0.35157 less likely to be misunderstood. BMJ 1999;318:
= (-1.887 to - 0.508). 1764.
5 McQuay HJ, Moore RA. Using numerical results
Exponentiating the lower and upper limits, we from systematic reviews in clinical practice. Ann
obtain the 95% CI for OR as (0.151–0.602). Intern Med 1997;126:712–20.
Because RR is clear in proportional scale, but has 6 Elferink AJA, Van Zwieten-Boot BJ. Analysis
no real meaning on an absolute scale, it might be based on number needed to treat shows differences
best to report both—to use a relative effect measure between drugs studied. BMJ 1997;314:603.
for summarizing the evidence and an absolute 7 Agresti AA. Categorical Data Analysis. New York:
measure for applying it to a concrete clinical or Wiley, 1990.
public health situation. For our example, all the sta- 8 Altman DG. Confidence intervals for the number
tistics show that ROP is better at preventing dysk- needed to treat. BMJ 1998;317:1309–12.
9 Newcombe RG. Interval estimation for the differ-
inesia. However, it is best to report that the risk
ence between independent proportions: compari-
with LD is three times higher than the risk with
son of eleven methods. Stat Med 1998;17:873–90.
ROP and that, by using ROP, the risk of develop- 10 EquivTest [computer program]. Version 1.0. Cork:
ing dyskinesia is reduced by 16%. These two pieces Statistical Solutions, 1998.
of information complete the picture. 11 Altman DG, Machin D, Bryant TN, Gardner MJ,
An interesting study reported by Malenka et al. eds. Statistics with Confidence (2nd ed.). London:
[19] tested whether a patient’s perception of benefit BMJ Books, 2000.
15244733, 2002, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1046/j.1524-4733.2002.55150.x by Readcube (Labtiva Inc.), Wiley Online Library on [31/01/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
436 Schechtman

12 Egger M. Meta-analysis. principles and proce- 17 Kleinbaum DG. Logistic Regression—A Self-
dures. BMJ 1997;315:1533–7. Learning Text. New York: Springer-Verlag,
13 Davies HTO, Crombie IK, Tavakoli M. When can 1994.
odds ratios mislead? BMJ 1998;316:989–91. 18 Rascol O, Brooks D, Korczyn AD, et al. A five-
14 Sinclair JC, Bracken MB. Clinically useful mea- year study of the incidence of dyskinesia in patients
sures of effect in binary analyses of randomized with early Parkinson’s disease who were treated
trials. J Clin Epidemiol 1994;47:881–9. with ropinirole or levodopa. N Engl J Med
15 Deeks JJ. When can odds ratios mislead? BMJ 2000;342:1484–91.
1998;317:1155–6. 19 Malenka DJ, Baron JA, Johansen S, et al. The
16 Sackett DL, Deeks JJ, Altman DG. Down with framing effect of relative and absolute risk. J Gen
odds ratios! Evidence-Based Med 1996;1:164–6. Intern Med 1993;10:543–8.

You might also like