A Genomic Strategy
A Genomic Strategy
A Genomic Strategy
original article
A bs t r ac t
Background
From the Institute for Genome Sciences Clinical trials have indicated a benefit of adjuvant chemotherapy for patients with
and Policy (A.P., S.M., H.K.D., A.B., J.K., stage IB, II, or IIIA — but not stage IA — non–small-cell lung cancer (NSCLC). This
G.S.G., M.W., J.R.N.) and the Institute of
Statistics and Decision Sciences (S.M., classification scheme is probably an imprecise predictor of the prognosis of an in-
M.W.), Duke University; and the Depart- dividual patient. Indeed, approximately 25 percent of patients with stage IA disease
ments of Medicine (A.P., J.K., M.K., G.S.G.), have a recurrence after surgery, suggesting the need to identify patients in this
Surgery (R.P., D.H.H.), and Molecular
Genetics and Microbiology (H.K.D., A.B., subgroup for more effective therapy.
J.R.N.), Duke University Medical Center
— both in Durham, N.C.; the Department Methods
of Medicine, University of Minnesota,
Minneapolis (R.K.); and the Department We identified gene-expression profiles that predicted the risk of recurrence in a cohort
of Pathology and Immunology, Washing- of 89 patients with early-stage NSCLC (the lung metagene model). We evaluated the
ton University School of Medicine, St. predictor in two independent groups of 25 patients from the American College of
Louis (M.A.W.). Address reprint requests
to Dr. Nevins at the Duke Institute for Ge- Surgeons Oncology Group (ACOSOG) Z0030 study and 84 patients from the Cancer
nome Sciences and Policy, Duke Univer- and Leukemia Group B (CALGB) 9761 study.
sity, 101 Science Dr., Box 3382, Durham,
NC 27708, or at nevin001@mc.duke.edu.
Results
N Engl J Med 2006;355:570-80. The lung metagene model predicted recurrence for individual patients significantly
Copyright © 2006 Massachusetts Medical Society. better than did clinical prognostic factors and was consistent across all early stages
of NSCLC. Applied to the cohorts from the ACOSOG Z0030 trial and the CALGB
9761 trial, the lung metagene model had an overall predictive accuracy of 72 percent
and 79 percent, respectively. The predictor also identified a subgroup of patients
with stage IA disease who were at high risk for recurrence and who might be best
treated by adjuvant chemotherapy.
Conclusions
The lung metagene model provides a potential mechanism to refine the estimation
of a patient’s risk of disease recurrence and, in principle, to alter decisions regarding
the use of adjuvant chemotherapy in early-stage NSCLC.
L
ung cancer is the leading cause of listed in Table 1 of the Supplementary Appendix,
death from cancer among both men and available with the full text of this article at www.
women in the United States, and non–small- nejm.org. All patients were enrolled according to
cell lung cancer (NSCLC) accounts for almost 80 protocols approved by the institutional review
percent of such deaths.1,2 The clinical staging sys- board of Duke University, after written informed
tem has been the standard for determining lung- consent had been obtained.
cancer prognosis.3-5 Although other clinical and
biochemical markers have prognostic signifi- Histopathological Evaluation
cance,6,7 none are more accurate than the clinico- For each cohort, a single pathologist reviewed all
pathological stage.8 slides to determine whether they met the histo-
The current standard of treatment for patients pathological criteria for NSCLC of the World Health
with stage I NSCLC is surgical resection, despite Organization, including the subtype of adenocar-
the observation that nearly 30 to 35 percent will cinoma and the degrees of differentiation, lym-
relapse after the initial surgery and thus have a phatic invasion, and vascular invasion. Only sam-
poor prognosis,2,4 indicating that a subgroup of ples with a tumor-cell content of more than 50
these patients might benefit from adjuvant che- percent were used in the analysis.
motherapy. Similarly, as a population, patients
with clinical stage IB, IIA or IIB, or IIIA NSCLC Gene-Expression Arrays
receive adjuvant chemotherapy,9-13 but some may Total RNA was extracted from the tumor tissue
receive potentially toxic chemotherapy unneces- with RNeasy Kits (Qiagen). The RNA quality was
sarily. Thus, the ability to identify subgroups of assessed with the use of a bioanalyzer (model 2100,
patients more accurately may improve health out- Agilent). Hybridization targets were prepared from
comes across the spectrum of disease. the total RNA according to standard Affymetrix
Previous studies have described the develop- protocols (described in detail in the Supplemen-
ment of gene-expression, protein, and messenger tary Appendix, along with the methods involved
RNA profiles that are associated in some cases in the scanning of the arrays and the normaliza-
with the outcome of lung cancer.14-24 However, tion of the resulting data). The microarray assays
the extent to which these profiles can be used to were carried out with Affymetrix GeneChips (U133
refine the clinical prognosis and the context in Plus2). All raw data and data transformed with
which improved prognostic capability could be the use of the robust multiarray average expres-
used to alter a clinical treatment decision were not sion measure for the Duke, ACOSOG, and CALGB
clear. Thus, we evaluated the use of gene-expres- data sets are available elsewhere (accession num-
sion patterns as a means of stratifying risk and ber GSE3593 in the Gene Expression Omnibus
treatment in NSCLC. database at www.ncbi.nlm.nih.gov/geo).
* Plus–minus values are means ±SD. Percentages may not total 100, because of rounding.
† There were more men in the study cohorts, since one of the principal sites involved was a Veterans Affairs medical center.
‡ Race was self-reported.
§ For the ACOSOG and CALGB data sets, the accuracy was predicted with the use of the Duke cohort as the training co-
hort. Recurrence was defined with the use of a probability of 0.5 as a cutoff.
Alive without
recurrence Disease recurrence or death
mgene79 ≤0.09 mgene79 >0.09
30/41 0.61 18/0 0.06
Probability of Recurrence
0.75 0.75
0.50 0.50
0.25 0.25
0.00 0.00
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
No. of Samples No. of Samples
sification- and regression-tree analysis to sample than stage of disease, tumor diameter, nodal sta-
these metagenes and build prognostic models; tus, age, sex, histologic subtype, or smoking his-
this approach mines the collection of profiles to tory (Table 3 in the Supplementary Appendix).
predict the clinical outcome best. An example tree Finally, further confirmation that the lung
(one of many generated in the analysis) is depicted metagene model represents the biology of the tu-
in Figure 2B. mor was provided by the finding that the meta-
The predictive accuracy of each model was genes with the greatest discriminatory capability
initially assessed with the use of leave-one-out in the model included genes that have previously
cross-validation, in which the analysis is performed been shown to have clinical relevance in NSCLC.
repeatedly, one sample is removed each time, and In some instances, a metagene represented a sin-
the probability of recurrence is predicted for that gle molecular process such as angiogenesis (meta-
sample. Because the entire model-building process gene 19), which is a proven target for therapy
is repeated for each prediction, the reproducibility in NSCLC. Other key metagenes, such as meta-
of the approach is also evaluated. As a measure of gene 41, represented a combination of biologic
model stability, we generated multiple iterations processes — for example, the BRAF, phospha-
of randomly split training and validation sets tidylinositol 3 kinase, TP53, and MYC signaling
from within the Duke cohort; the resulting ac- pathways.
curacy of prognostic capability exceeded 85 per-
cent (data not shown). Validation of the Metagene Prognostic Model
The lung metagene model for the prediction Validation across Early Stages and Subtypes
of recurrence was superior to a predictive model of NSCLC
generated with the same methods but that in- The samples used to devise the prognostic model
cluded clinical data alone (including age, sex, represented both the major histologic subtypes
tumor diameter, stage of disease, histologic sub- of NSCLC (adenocarcinoma and squamous-cell
type, and smoking history). In the Duke cohort, carcinoma) and all the early stages of disease. To
the lung metagene model predicted disease re- assess the general robustness of the prognostic
currence with an overall accuracy of 93 percent model in the Duke cohort, we examined the pre-
(Fig. 2C). The model built with clinical data had dictions of risk as a function of these variables.
an accuracy of only 64 percent (Fig. 2D). Inclu- The lung metagene model was consistently ac-
sion of the clinical data with the genomic data curate across all the early stages of NSCLC (Fig.
did not further improve the accuracy of the pre- 1 in the Supplementary Appendix) and between
diction of recurrence over that of the genomic the major histologic subtypes (Fig. 2 in the Sup-
data alone. plementary Appendix), not only in the estimated
The outperformance of the clinical model by risk of recurrence but also in the results of the
the lung metagene model in identifying patients Kaplan–Meier survival analysis for each stage or
at risk for recurrence was also supported by the subtype.
results of Kaplan–Meier analyses. The lung meta-
gene model identified two distinct groups of pa- Validation across Data from Two Multicenter Studies
tients with respect to survival (Fig. 3A). In con- For a new prognostic model that assesses the risk
trast, the distinction was less clear for each of the of recurrence to be used to inform the decision of
models based on clinical predictions (one that whether to administer adjuvant chemotherapy, the
combined the clinical variables in a manner simi- model must be shown to be robust when applied
lar to the lung metagene model, and another that to independent, heterogeneous populations of pa-
was based on individual clinical prognostic fac- tients and conditions of sample acquisition. We
tors [tumor diameter and stage of disease are therefore evaluated the ability of the metagene
shown]) (Fig. 3B). Univariate and multivariate model generated from the Duke training cohort
analyses (with and without the genome-based to predict the risk of recurrence by using samples
assessment of the risk of recurrence) to assess the from two multicenter, cooperative group studies
relative prognostic value of the individual clinical (ACOSOG Z0030 and CALGB 9761) (Fig. 1). These
variables and the lung metagene model showed sets of samples represented the full spectrum of
that the lung metagene model performed signifi- clinical outcomes; the samples were not selected
cantly better (P<0.001 by multivariate analysis) with respect to the duration of survival.
Figure 3. Kaplan–Meier Survival Estimates for the Duke A Lung Metagene Model
Training Cohort. 100
Estimates based on predictions from the lung meta-
gene model demonstrate the value of that approach Low risk of recurrence
75
(Panel A). Panel B shows the estimates based on the
Survival (%)
clinical model of prognosis, as well as those based on
individual clinical characteristics — here, tumor diam- 50
eter and stage of disease. A high risk of recurrence was
defined as a probability of recurrence of more than 0.5,
25
and a low risk of recurrence was defined as a risk of High risk of recurrence
0.5 or less. P values were obtained with the use of a P<0.001
log-rank test. Tick marks indicate patients whose data 0
were censored by the time of last follow-up or owing to 0 10 20 30 40 50 60 70 80 90 100
death. Months
Survival (%)
and multivariate analyses showed that the meta- 50
gene model was a significantly more accurate
predictor (P<0.001 by multivariate analysis) than 25 High risk of recurrence
stage of disease, tumor diameter, nodal status,
P=0.04
age, sex, histologic subtype, or smoking history
0
(Table 3 in the Supplementary Appendix). The ac- 0 10 20 30 40 50 60 70 80 90 100
curacy of the prediction of recurrence in the Months
ACOSOG samples was approximately 72 percent
(sensitivity, 85 percent; specificity, 58 percent; 100
positive predictive value, 69 percent; and negative
predictive value, 78 percent) (Fig. 4A). The level 75
Tumor diameter ≤3.0 cm
of accuracy provides an assessment of the robust-
Survival (%)
80 Low risk of
0.75
recurrence
Survival (%)
60
0.50
40 High risk of
recurrence
0.25
20
P<0.001
0.00 0
0 5 10 15 20 25 30 0 10 20 30 40 50 60
No. of Samples Months
Low risk of
80 recurrence
0.75
Survival (%)
60
0.50
40
High risk of
0.25 recurrence
20
P<0.001
0.00 0
0 10 20 30 40 50 60 70 80 90 0 25 50 75 100 125 150
No. of Samples Months
Figure 4. Independent Validation of the Lung Metagene Model with the Use of Data from the ACOSOG Z0030 Study
and the CALGB 9761 Study.
The lung metagene model was used to estimate the probabilities of recurrence for the ACOSOG samples (Panel A)
and the CALGB samples (Panel B) and to estimate the Kaplan–Meier survival estimates according to the predicted
risk of recurrence. For the CALGB cohort, investigators were unaware of the clinical outcomes, and the predictive re-
sults were submitted to the CALGB statistical center for the evaluation of performance. I bars represent 95 percent
confidence intervals. A high risk of recurrence was defined as a risk of more than 0.5, and a low risk of recurrence
was defined as a risk of 0.5 or less. P values were obtained with the use of a log-rank test. Tick marks indicate pa-
tients whose data were censored by the time of last follow-up or owing to death.
currence were submitted to a CALGB statistician probability of recurrence of greater than 0.5 as
for comparison with the true outcomes. Once compared with 0.5 or less, according to the lung
again, univariate and multivariate analyses showed metagene model (Fig. 4B). Similar to the results
that the lung metagene model predicted outcome seen for the Duke and ACOSOG data, the adjusted
significantly better (P<0.001 by multivariate anal- odds ratio for disease recurrence in the CALGB
ysis) than the stage of disease, tumor diameter, cohort was 16.6 (95 percent confidence interval,
nodal status, age, sex, histologic subtype, or smok- 4.4 to 62.8) when the model estimate for recur-
ing history (Table 3 in the Supplementary Ap- rence was greater than 0.5 (Table 3 in the Supple-
pendix). The overall predictive accuracy of the mentary Appendix).
model for the CALGB samples was 79 percent We also applied the lung metagene model to
(sensitivity, 68 percent; specificity, 88 percent; another cohort of 15 patients with surgically re-
positive predictive value, 79 percent; and negative sected stage I squamous-cell lung cancer. Using
predictive value, 80 percent) (Fig. 4A). Again, the the lung metagene model, we were able to predict
Kaplan–Meier analysis showed a significant dif- the outcome accurately in all 5 patients with re-
ference in the survival rates of patients with a currence and in 7 of 10 patients without recur-
rence, for an overall accuracy of 80 percent (Fig. 3 apy. We therefore focused on the 68 patients from
in the Supplementary Appendix). the Duke, ACOSOG, and CALGB cohorts who were
Finally, to evaluate the extent to which the classified clinically as having stage IA disease.
metagene model could increase the ability of clini- Kaplan–Meier survival curves were generated for
cians to estimate prognosis, we computed a C sta- the group as a whole, as well as for the subgroups
tistic as a measure of the capacity of the clinical predicted to be at high or low risk for recurrence
or genomic information to identify patients ac- by the lung metagene model. Although the sur-
cording to the risk of recurrence. For the ACOSOG vival rate for the group was approximately 70 per-
cohort, the C statistic based on clinical variables cent at four years, the survival rate for those pre-
alone was 0.67; this value was increased to 0.84 dicted to be at high risk was less than 10 percent
by the inclusion of genomic data. For the CALGB (Fig. 5A), thus identifying the subgroup of patients
cohort, inclusion of the genomic data increased with stage IA NSCLC at risk for recurrence.
the value from 0.73 to 0.87. Clearly, the genomic
data transformed a limited clinical-based progno- Dis cus sion
sis to one with substantial capacity to identify pa-
tients who were likely to have disease recurrence. Although gene-expression profiles that can clas-
sify patients with cancer according to their risk
Application of the Refined Prognosis of recurrence have been described in many in-
Previous studies have shown that 25 percent of stances, the prognostic tool we devised could be
patients with stage IA NSCLC will have disease used to change a clinical decision. In particular,
recurrence within five years. Thus, some patients the guidelines for the treatment of patients with
with stage IA NSCLC might be more appropriately stage I NSCLC provide an opportunity to use an
categorized as being at higher risk than others improved prognostic model to refine the currently
and might be candidates for adjuvant chemother- imprecise assessment of risk and the decision re-
A B
Patients with stage IA NSCLC
40
Observation Chemotherapy
Figure 5. Application of the Lung Metagene Model to Refine the Assessment of Risk and Guide the Use of Adjuvant
Chemotherapy in Stage IA NSCLC.
Panel A shows the Kaplan–Meier survival estimates for a group of patients with stage IA disease from the Duke,
ACOSOG, and CALGB cohorts and the subgroups predicted to have either a high probability (>0.5) or a low proba-
bility (≤0.5) of recurrence. P values were obtained with the use of a log-rank test. Tick marks indicate patients
whose data were censored by the time of last follow-up or owing to death. Panel B illustrates the possible design of
a planned prospective, phase 3 clinical trial involving patients with stage IA NSCLC to evaluate the performance
of the metagene model.
garding whom to treat, and thus potentially lead- first step in the use of genomic tools as a strategy
ing to more personalized cancer treatment. In to refine the prognosis and improve the selection of
this case, the refinement of prognosis with the patients appropriate for adjuvant chemotherapy.
use of the metagene model provides the opportu- Drs. Nevins, West, and Dressman report holding equity in
nity for a prospective, randomized, phase 3 clin- Expression Analysis, a DNA microarray service provider estab-
ical trial that would evaluate the benefit of the lished by Duke University. Drs. Nevins, West, Dressman, and
Ginsburg report having served on the advisory board of Expres-
identification of a subgroup of patients with sion Analysis. Dr. Dressman reports having served as a paid
stage IA disease estimated to be at high risk for consultant to Expression Analysis, which carried out the micro-
recurrence (Fig. 5B). Patients initially classified as array assays with Affymetrix GeneChips (U133 Plus2). Dr. Har-
pole reports having served on the advisory board of Genentech
having clinical stage IA disease would undergo (OSI Pharmaceuticals). No other potential conflict of interest
surgery, and the metagene model would then be relevant to this article was reported.
applied to identify the patients predicted to be at We are indebted to the participants of the ACOSOG Z0030
high risk for recurrence. Patients at high risk and CALGB 9761 studies; to Mark Allen, principal investigator
of the ACOSOG Z0030 study; to Michael Maddaus, principal in-
would then be randomly assigned to observation vestigator of the CALGB 9761 study; to Xiaofei Wang, statistician
(the current standard of care for stage IA disease) for the CALGB 9761 study, who was also responsible for the
or adjuvant chemotherapy, in order to evaluate blinded validation of the model predictions; to David Beer, at the
University of Michigan, for the array data on the CALGB 9761
the extent to which the use of genomic reclassi- data set; and to Kaye Culler for her assistance with the prepara-
fication improves survival. Our study is a critical tion of the manuscript.
References
1. Spira A, Ettinger DS. Multidisciplinary tion in resected non–small-cell lung can- 17. Ju Z, Kapoor M, Newton K, et al.
management of lung cancer. N Engl J Med cer. N Engl J Med 2005;352:2589-97. Global detection of molecular changes re-
2004;350:379-92. 11. Douillard J-Y, Rosell R, Delena M, veals concurrent alteration of several bio-
2. Hoffman PC, Mauer AM, Vokes EE. Legroumellec A, Torres A, Carpagnano F. logical pathways in nonsmall cell lung can-
Lung cancer. Lancet 2000;355:479-85. ANITA: phase III adjuvant vinorelbine (N) cer cells. Mol Genet Genomics 2005;274:
[Erratum, Lancet 2000;355:1280.] and cisplatin (P) versus observation (OBS) 141-54.
3. Mountain CF. Revisions in the Inter- in completely resected (stage I-III) non- 18. Beer DG, Kardia SLR, Huang CC, et
national System for Staging Lung Cancer. small-cell lung cancer (NSCLC) patients al. Gene-expression profiles predict sur-
Chest 1997;111:1710-7. (pts): final results after 70-month median vival of patients with lung adenocarcino-
4. Nesbitt JC, Putnam JB Jr, Walsh GL, follow-up. J Clin Oncol 2005;23:Suppl: ma. Nat Med 2002;8:816-24.
Roth JA, Mountain CF. Survival in early- 7013. abstract. 19. Chen G, Gharib TG, Wang H, et al.
stage non-small cell lung cancer. Ann 12. Kato H, Ichinose Y, Ohta M, et al. Protein profiles associated with survival
Thorac Surg 1995;60:466-72. A randomized trial of adjuvant chemo- in lung adenocarcinoma. Proc Natl Acad
5. Mountain CF. The new International therapy with uracil–tegafur for adenocar- Sci U S A 2003;100:13537-42.
Staging System for Lung Cancer. Surg cinoma of the lung. N Engl J Med 2004; 20. Bhattacharjee A, Richards WG,
Clin North Am 1987;67:925-35. 350:1713-21. Staunton J, et al. Classification of human
6. D’Amico TA, Massey M, Herndon JE 13. Strauss GM. Herndon JE II, Maddaus lung carcinomas by mRNA expression
II, Moore MB, Harpole DH Jr. A biologic MA, et al. Randomized clinical trial of profiling reveals distinct adenocarcinoma
risk model for stage I lung cancer: im- adjuvant chemotherapy with paclitaxel and subclasses. Proc Natl Acad Sci U S A
munohistochemical analysis of 408 pa- carboplatin following resection in Stage 2001;98:13790-5.
tients with the use of ten molecular mark- 1B non-small cell lung cancer. J Clin On- 21. Wigle DA, Jurisica I, Radulovich N, et
ers. J Thorac Cardiovasc Surg 1999;117: col 2004;22:7019. abstract. al. Molecular profiling of non-small cell
736-43. 14. Tonon G, Wong KK, Maulik G, et al. lung cancer and correlation with disease-
7. Brundage MD, Davies D, Mackillop High-resolution genomic profiles of hu- free survival. Cancer Res 2002;62:3005-8.
WJ. Prognostic factors in non-small cell man lung cancer. Proc Natl Acad Sci U S A 22. Kikuchi T, Daigo Y, Katagiri T, et al.
lung cancer: a decade of progress. Chest 2005;102:9625-30. Expression profiles of non-small cell lung
2002;122:1037-57. 15. Schneider PM, Praeuer HW, Stoeltz- cancers on cDNA microarrays: identifica-
8. Meyerson M, Carbone DP. Genomic ing O, et al. Multiple molecular marker tion of genes for prediction of lymph-node
and proteomic profiling of lung cancers: testing (p53, C-Ki-ras, c-erbB-2) improves metastasis and sensitivity to anti-cancer
lung cancer classification in the age of estimation of prognosis in potentially cu- drugs. Oncogene 2003;22:2192-205.
targeted therapy. J Clin Oncol 2005;23: rative resected non-small cell lung cancer. 23. Garber ME, Troyanskaya OG, Schlu-
3219-26. Br J Cancer 2000;83:473-9. ens K, et al. Diversity of gene expression
9. Arriagada R, Bergman B, Dunant A, 16. Berrar D, Sturgeon B, Bradbury I, in adenocarcinoma of the lung. Proc Natl
et al. Cisplatin-based adjuvant chemo- Downes CS, Dubitzky W. Survival trees for Acad Sci U S A 2001;98:13784-9.
therapy in patients with completely re- analyzing clinical outcome in lung adeno- 24. Yanaihara N, Caplen N, Bowman E, et
sected non–small-cell lung cancer. N Engl carcinomas based on gene expression pro- al. Unique microRNA molecular profiles
J Med 2004;350:351-60. files: identification of neogenin and diacyl- in lung cancer diagnosis and prognosis.
10. Winton T, Livingston R, Johnson D, et glycerol kinase alpha expression as critical Cancer Cell 2006;9:189-98.
al. Vinorelbine plus cisplatin vs. observa- factors. J Comput Biol 2005;12:534-44. 25. Pittman J, Huang E, Dressman H, et
al. Integrated modeling of clinical and Pittman J, Huang AT, West M. Towards al. Predicting the clinical status of human
gene expression information for person- integrated clinico-genomic models for breast cancer by using gene expression
alized prediction of disease outcomes. personalized medicine: combining gene profiles. Proc Natl Acad Sci U S A 2001;
Proc Natl Acad Sci U S A 2004;101:8431- expression signatures and clinical factors 98:11462-7.
6. in breast cancer outcomes prediction. 30. Denison DGT, Mallick BK, Smith AFM.
26. Pittman J, Huang E, Nevins JR, Wang Hum Mol Genet 2003;12:R153-R157. A Bayesian CART algorithm. Biometrika
Q, West M. Bayesian analysis of binary 28. Huang E, Cheng SH, Dressman H, et 1998;85:363-77.
prediction tree models for retrospectively al. Gene expression predictors of breast 31. Breiman L. Statistical modeling: the
sampled outcomes. Biostatistics 2004;5: cancer outcomes. Lancet 2003;361:1590- two cultures. Stat Sci 2001;16:199-225.
587-601. 6. Copyright © 2006 Massachusetts Medical Society.
27. Nevins JR, Huang ES, Dressman H, 29. West M, Blanchette C, Dressman H, et