Borderline Significance

Interpreting and reporting clinical trials with results of borderline significance
Amy Kirkwood and Professor Allan Hackshaw CRUK and UCL Cancer Trials Centre
What is the problem?

After submitting several phase III trials to high profile journals we noticed a disparity in the language they allowed us to use when forming conclusions about borderline results. Some initially stated that a p-value just above 0.05 indicated that there was no effect, despite a clinically relevant effect size Should these results really just be ignored?
Why do we stick to a 0.05 limit?

The cut-off of 0.05 was first suggested by RA Fisher in 1925 as being low enough to make decisions. It has become widely adopted. It is an arbitrary cut-off, but many researchers seem to adhere to it strictly
Examples
Relative risk of 0.75 (95% CI: 0.57-0.99) p-value = 0.048 Clear evidence of an effect? Relative risk of 0.75 (95% CI: 0.55-1.03) p-value = 0.07 No effect?
Size of treatment effects

In the past, many new interventions at that time were compared to minimal or no treatment so we were looking for (and finding) large treatment effects. Today new interventions are being compared to standard treatments which already work well so smaller differences are expected It is therefore not as easy to get very small p-values.
How are p-values determined?

Size of a p-value
Size of the treatment effect Eg. hazard ratio, relative risk, absolute risk difference, or mean difference
Size of the standard error, which is influenced by: Number of subjects Number of events* Standard deviation*
Interpretation
Very small p-values (easy to interpret) arise when the effect size is large and the standard error is small. Borderline p-values arise when either: We have a clinically meaningful treatment effect but a moderate or large standard error (usually when there are insufficient participants or events). Or The treatment effect is smaller than expected (should have had a larger trial).
An example: EICESS-92 phase III trial

A trial comparing standard chemotherapy with or without etoposide for treating Ewings Sarcoma at high risk of recurrence/death (mainly a childhood cancer). Primary endpoint: Event-free-survival Powered to detect a hazard ratio of 0.60 Sample size: 400 patients (but 492 actually recruited).
The EICESS-92 Phase III Trial

Observed HR: 0.83 (95% CI: 0.65-1.05) p=0.12 P>0.05 Should we conclude no effect? But a 17% risk reduction is clinically significant, although smaller than the 40% initially expected. How do we interpret this?
EICESS-92: The Confidence Interval

Most people understand that the true effect is likely to lie somewhere within the CI range - hence the possibility of it being 1.0 (no effect). But there is a common misconception that it lies anywhere within this range with equal probability.

The true HR is more likely to lie around the estimated HR (0.83) than at the extremes of the confidence interval.

There is a 50% chance that the range 0.77 and 0.90 contains the true hazard ratio
0.77
0.90
Similarly there is a 75% chance that 0.72 and 0.95 contains the true HR.

The upper limit of the confidence interval is 1.05 and only just exceeds 1.0.
There is only a 6% chance that the range 1.0 contains the true HR
The conclusion reported in the paper was that the addition of etoposide seemed to be beneficial. This is the only randomised trial of etoposide in these children. The disorder is uncommon: 6.5 years to recruit 492 patients across Europe. Another trial is unlikely. Although the target sample size was exceeded, the treatment effect was smaller than expected (HR 0.83 vs 0.60), which is probably why the result was not statistically significant (i.e. trial was not big enough).
Are these sort of results common and how are they reported?
The Literature Search

We conducted a literature search to see how often trials with borderline results arose and how they were reported. We looked though every issue of 6 major journals in 2009. The journals chosen were
The BMJ The Lancet JAMA New England Journal of Medicine Journal of the National Cancer Institute Journal of Clinical Oncology
The Literature Search

To be selected a paper had to:
Report the results of a phase III randomised trial. Have borderline results for the primary outcome measure.
What counted as borderline?

To count as a borderline result we needed to see:
A non-zero effect size AND A p-value between 0.05 and 0.1 OR one end of the 95% confidence interval close to the no effect value (eg for ratios, the upper tail of the CI had to be <1.1 or the lower tail >0.90)
Literature search results

Below is a table showing the numbers of phase III trials found and the number with borderline p-values.
Journal
BMJ The Lancet JAMA NEJM JNCI JCO
Number of Phase III trials 44 64 40 70 6 64 288
Number with Borderline p-values 2 3 2 8 3 6 24 (1 in 12)
Literature Search Results

We examined the conclusion given in the abstract because this is what most people focus on. Was the language used appropriate? Some authors discussed their results further in the Discussion section.
Literature Search Results

Conclusion Number of Studies Range of P-values
No effect
10 11 3
0.06 - 0.17 0.06 - 0.13 0.056 - 0.1
Some evidence Confidence in effect
Example 1
Interventions and patient group Primary endpoint Main result Conclusion reported in the Abstract Those receiving nurse-led intervention had higher scores for quality of life and mood, but did not have improvements in symptom intensity scores
Nurse-led psychoeducational intervention versus usual care for palliative care in patients with advanced cancer
Symptom intensity, assessed by an assessment scale (quality of life and resource use were other endpoints) N=322
Mean difference: -27.8 scores (95% CI -57.2 to +1.6) P=0.06
Bakitas et al, JAMA 2009;302:741-9.
Example 2
Interventions and patient group Conclusion reported in the Abstract Admissions to hospital were significantly reducedbut no other clinical benefits were shown
Primary endpoint
Main result
Tailored care plan versus usual care in patients with coronary heart disease
Patients with systolic blood pressure >140mm Hg at 18 months (hospital admission was another endpoint) N=903
Odds ratio 0.66 95% CI 0.43 to 1.01 P=0.06
Murphy et al, BMJ 2009;339:b4220.
Example 3
Interventions and patient group Primary endpoint Main result Conclusion reported in the Abstract
Pre-surgical chemoradiotherapy versus chemotherapy Overall survival in patients with N=126 locally advanced (target was 576) cancer of the esophagogastric junction.
Hazard ratio 0.67 95% CI 0.41 to 1.07 P=0.07
Although statistical significance was not achieved, results point to a survival advantage for preoperative chemoradiotherapy
Stahl et al, J Clin Oncol 2009;27:851-6.
Example 4
Interventions and patient group Primary endpoint Main result Conclusion reported in the Abstract
Aerobic exercise training plus usual care versus usual care alone, in patients with chronic heart failure
All-cause mortality or hospitalisation N=2331
Hazard ratio 0.93 95% CI 0.84 to 1.02 P=0.13
exercise training resulted in non-significant reductions in the primary endpoint.
OConnor et al, JAMA 2009;301:1439-50.
Example 5
Interventions and patient group Primary endpoint Main result Conclusion reported in the Abstract ..a single inexpensive artesunate suppository substantially reduces the risk of death or permanent disability
Artesunate suppository versus placebo in patients with severe malaria who cannot be treated orally; N=12,068
Mortality
Risk difference -0.4% 95% CI -1.0 to +0.2% P=0.1
Gomes et al, Lancet 2009;373:557-66.
Example 6
Interventions and patient group
Telephone counselling using cognitive behavioural skills vs. no intervention to encourage smoking cessation in adolescents; N=2151
Primary endpoint
Main result
Conclusion reported in the Abstract
6-months prolonged abstinence from smoking
Absolute risk difference 4.0% 95% CI -0.2 to 8.1% P=0.06
personalized motivational interviewing...is effective in increasing teen smoking cessation
Peterson et al, J Natl Cancer Inst 2009;101:1378-92.
Papers with borderline negative results

What if a new intervention appears to show harm but has a borderline p-value? Perhaps authors would be inclined to be firmer with conclusions than if a new intervention shows possible benefit? We found two such papers where the authors only concluded that it did not show benefit.
Papers with borderline negative results

Trial of calcuim dobesilate vs placebo for the prevention of clinically significant macular oedema (CSME) in 635 patients with type 2 diabetes. 86 patients in the calcuim dobesilate group and 69 in the placebo group developed CSME Hazard ratio 1.32 (95% CI 0.96-1.81), p=0.08
Calcium dobesilate did not reduce the risk of development of CSME.
Borderline results elsewhere

We were sent this table after our paper was published, showing the results and conclusions from papers on statins and mortality.
Meta-analyses Arch Intern Med 2005; 165:725-730 Arch Intern Med 2006; 166: 2307 2313 J Am Coll Cardiol 2008; 52: 1769-81 BMJ 2009;338:b2376 Arch Intern Med 2010; 170: 1024-1031 Risk Estimate (95% CI) Authors Conclusions 0.87 (0.81 - 0.94) 0.92 (0.84 -1.01) 0.93 (0.87- 0.99) 0.88 (0.81 0.96) 0.91 (0.83 -1.01) Decreases mortality No effect Decreases mortality Decreases mortality No effect
Possible solutions
Design trials with larger numbers. But not always feasible (eg high costs or rare disorder) However, even a relatively large trial can produce an effect size smaller than expected (Ewings sarcoma example) Meta analyses. Example (doublet chemotherapy for pancreatic cancer):
One trial: HR 0.86, 95% CI 0.72-1.02, p=0.08 Meta-analysis 3 trials: HR 0.86, 95% CI 0.75-0.98, p=0.02
Conclusions
Borderline results cannot be used as strong evidence either in favour or against an intervention But do not completely dismiss an effect if p>0.05 when the treatment effect is clinically meaningful Do not conclude no effect; look at other endpoints, and other evidence A lack of statistical significance does not mean lack of an effect (Altman & Bland BMJ 1995)
Conclusions
Say that there is probably evidence of an effect but use appropriate language, eg words such as suggestion, indication and seems The same principles apply to other areas of research (eg risk factors)

Borderline Significance

Uploaded by

Copyright:

Available Formats

Borderline Significance

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Borderline Significance

Uploaded by

Copyright:

Available Formats

Interpreting and reporting clinical trials with results of borderline significance

What is the problem?

Why do we stick to a 0.05 limit?

Size of treatment effects

How are p-values determined?

An example: EICESS-92 phase III trial

The EICESS-92 Phase III Trial

EICESS-92: The Confidence Interval

EICESS-92: The Confidence Interval

EICESS-92: The Confidence Interval

EICESS-92: The Confidence Interval

The Literature Search

The Literature Search

What counted as borderline?

Literature search results

Number of Phase III trials 44 64 40 70 6 64 288

Number with Borderline p-values 2 3 2 8 3 6 24 (1 in 12)

Literature Search Results

Literature Search Results

0.06 - 0.17 0.06 - 0.13 0.056 - 0.1

Some evidence Confidence in effect

Mean difference: -27.8 scores (95% CI -57.2 to +1.6) P=0.06

Bakitas et al, JAMA 2009;302:741-9.

Odds ratio 0.66 95% CI 0.43 to 1.01 P=0.06

Murphy et al, BMJ 2009;339:b4220.

Hazard ratio 0.67 95% CI 0.41 to 1.07 P=0.07

Stahl et al, J Clin Oncol 2009;27:851-6.

All-cause mortality or hospitalisation N=2331

Hazard ratio 0.93 95% CI 0.84 to 1.02 P=0.13

exercise training resulted in non-significant reductions in the primary endpoint.

OConnor et al, JAMA 2009;301:1439-50.

Risk difference -0.4% 95% CI -1.0 to +0.2% P=0.1

Gomes et al, Lancet 2009;373:557-66.

Conclusion reported in the Abstract

6-months prolonged abstinence from smoking

Absolute risk difference 4.0% 95% CI -0.2 to 8.1% P=0.06

personalized motivational interviewing...is effective in increasing teen smoking cessation

Peterson et al, J Natl Cancer Inst 2009;101:1378-92.

Papers with borderline negative results

Papers with borderline negative results

Borderline results elsewhere

You might also like