Class Prediction by Nearest Shrunken Centroids, With Applications To DNA Microarrays
Class Prediction by Nearest Shrunken Centroids, With Applications To DNA Microarrays
104
CLASS PREDICTION BY NEAREST SHRUNKEN CENTROIDS 105
F IG . 1. Centroids (grey) and shrunken centroids (red) for the lymphoma/leukemia data set. Each centroid has the overall centroid
subtracted; hence, what we see are contrasts. The horizontal units are log ratios of expression. Going from left to right, the number of
training samples is 27, 5 and 7. The order of the genes is determined by hierarchical clustering.
We propose the “nearest shrunken centroid” method, toward the classification. The amount of shrinkage is
which uses denoised versions of the centroids as determined by cross-validation.
prototypes for each class. The optimally shrunken In the preceding example, the (unshrunken) nearest
centroids, derived using a method described below, are centroid method had the same error rate as the nearest
shown as red bars in Figure 1. Classification is then shrunken centroid procedure. This is not always the
made to the nearest (shrunken) centroid. The resulting case. Table 1 shows results taken from Tibshirani,
procedure has zero test errors. In addition, only 81 Hastie, Narasimhan and Chu (2002) on classification
genes have a nonzero red bar for one or more classes in of small round blue cell tumors. The data are taken
Figure 1 and, hence, are the only ones that contribute from Khan et al. (2001). There are 25 test samples
106 TIBSHIRANI, HASTIE, NARASIMHAN AND CHU
TABLE 1
Results on classification of small round blue cell tumors
F IG . 4. Shrunken differences dik for the 81 genes that have at least one nonzero difference.
classes is large. This can result in a cross-validation 2.4 Class Probabilities and Discriminant Functions
curve that has discrete jumps and high variability.
To help with this problem, we can use the mean We classify test samples to the closest shrunken
cross-validated log-likelihood rather than misclassifi- centroid, again standardizing by si . We also make a
cation error. Since our model produces class probabil- correction for the relative abundance of members of
ity estimates [see Equation (8) in Section 2.2], the log- each class. Details are given next.
likelihood of a test sample x ∗ with class label y ∗ is Suppose we have a test sample (vector) with expres-
log p̂y ∗ (x ∗ ). The mean log-likelihood curve is typically sion levels x ∗ = (x1∗ , x2∗ , . . . , xp∗ ). We define the dis-
smoother than the misclassification error curve. criminant score for class k as
Figure 6 shows the test set log-likelihood and mis-
(x ∗ − x̄ )2
p
classification error curves for the lymphoma data. (This (6) δk (x ∗ ) = i ik
− 2 log πk .
is for illustration only; we are not suggesting use of i=1 si2
the test error to select .) They give a similar picture,
although the choice of the smallest model where the The first part of (6) is simply the standardized squared
log-likelihood starts to dip yields more genes than that distance of x ∗ to the kth shrunken centroid. The
from the misclassification error curve. In the next sec- second part is a correction based on the class prior
tion we make use of the log-likelihood in estimation of probability πk , where K k=1 πk = 1. This prior gives
class probabilities. the overall proportion of class k in the population. The
CLASS PREDICTION BY NEAREST SHRUNKEN CENTROIDS 109
F IG . 6. Test set mean log-likelihood curve (red) and test set misclassification error curve (green). The latter has been translated so that
it fits in the same plotting region. The broken line shows where the log-likelihood curve starts to dip, while the dotted line shows where the
misclassification error starts to rise.
110 TIBSHIRANI, HASTIE, NARASIMHAN AND CHU
F IG . 7. Estimated test set probabilities using the 48 gene model from minimizing misclassification error (left) and the 78 gene model from
maximizing the log-likelihood (right). Probabilities are partitioned by the true class. There are no classification errors in the test set.
F IG . 9. Centroids for each of four classes for the two simulation scenarios. The standard deviations for each class are indicated at the top
of the plot.
pression values were independent Gaussian with vari- 4. SOFT VERSUS HARD THRESHOLDING
ance 1. In the first simulation, the class centroids
An alternative to the soft thresholding (5) would be
were [r(3, 500), r(0.4, 500)], [r(0.5, 100), r(0, 900)],
to keep all differences greater in absolute value than
[r(0, 100), r(0.5, 100), r(0, 800)] and [r(0, 100),
and discard the others; that is,
r(0, 100), r(0.5, 100), r(0, 700)]. The centroids are
shown in the top panel of Figure 9. (10) dik = dik · I (|dik | > ).
Thus the first class is far from the others, in the
This is sometimes known as hard thresholding. It dif-
space spanned by the first 500 genes. The top panel of
fers from soft thresholding in that differences greater
Figure 8 shows the mean ± 1 standard deviation of the
than are unchanged, rather than shrunken toward
test error over five simulations. The methods used were
zero by the amount . One drawback of hard thresh-
default (equal) thresholds (red) and adaptive thresholds
olding is its “jumpy” nature: as the threshold is in-
(green). The average values of the adaptive threshold
creased, a gene with a full contribution dik suddenly is
were 2.0, 1.0, 1.0 and 1.0. The adaptive threshold
set to zero.
method generally has lower test error.
In the second simulation, the means in the four To investigate the relative behavior of hard versus
classes were [r(0.5, 300), r(0, 700)], [r(0.5, 150), soft thresholding, we generated standard normal ex-
r(−0.5, 150), r(0, 700)], [r(−0.5, 150), r(0.5, 150), pression data for 1,000 genes and 40 samples, with
r(0, 700)] and [r(−0.5, 150), r(−0.5, 150), r(0, 700)]. 20 samples in each of two classes. For the first 100
The centroids are shown in the bottom panel of genes, we added a random effect µi ∼ N (0.0, 0.52) to
Figure 9. The standard deviations in each class were each expression level in class 2 for each gene i. Hence
2, 1.5, 1.5 and 1.0. Thus each class centroid is equidis- 100 of the 1,000 genes are differentially expressed in
tant from the overall centroid (the origin), but the the two classes by varying amounts. This experiment
within-class standard deviations are different. The bot- was repeated 10 times and the results were averaged.
tom of Figure 8 shows the results: again the adaptive The left panel of Figure 10 shows the test error for
threshold does better in terms of test error; the aver- hard and soft thresholding, as the threshold is var-
age values of the adaptive threshold were 1.4, 1.1, 1.2 ied, while
the right panel displays themean squared
error i (µ̂i − µi )2 /p, where µ̂i = 20
and 1.0. With equal thresholds, the majority of nonzero j =1 xij /20 −
40
genes were in class 1: under the adaptive thresholds, j =21 xij /20. In the left panel, we see that soft thresh-
the distribution was more balanced. olding yields lower test error at its minimum; the right
112 TIBSHIRANI, HASTIE, NARASIMHAN AND CHU
F IG . 10. Simulated data in two classes. Left: Test misclassification error as the threshold is varied, using hard thresholding (h) and soft
thresholding (s). Right: The estimation error (µ̂i − µi )2 /p, where µi and µ̂i are the true and estimated difference in expression between
class 1 and class 2 for gene i. Results are averages over 10 simulations: standard error of the average is about 0.015 in the left panel and
0.01 in the right panel.
panel shows that soft thresholding does a much better Figure 12. The values indicate average gene expres-
job of estimating the gene expression differences. sion. There are two subclasses in class 2, and each
of these can be distinguished from class 1 based on
5. NATIONAL CANCER INSTITUTE CANCER LINES a small set of genes. However, nearest shrunken cen-
AND SUBCLASS DISCOVERY troids will fail here, because the overall centroids for
each class are the same. Linear separating classifiers,
Here we describe how to use nearest centroid shrink-
such as support vector machines (SVM), and linear
age to discover subclasses. We consider data from Ross
discriminant analysis will also do poorly here. Either
et al. (2000) that consist of measurements on 6,830
could be made to work with a suitable nonlinear trans-
genes on 61 cell lines. The samples have been catego-
formation of the features (or choice of kernel for the
rized into eight different cancer classes: breast (BRE),
CNS, colon (COL), leukemia (LEU), melanoma (MEL),
non-small cell lung cancer (NSC), ovarian (OVA) and
renal (REN). We randomly chose a training set of size
40 and a test set of size 21, so that the classes were well
represented in both sets. Default (equal) soft thresh-
olding was used, with the prior probabilities set to the
sample class proportions. The results are shown in Fig-
ure 11. The best cross-validated error rate occurs at
about 5,000 genes, giving a test error of 5/21. Adap-
tive thresholding failed to improve this result.
We also tried both support vector machines (Ra-
maswamy et al., 2001) and regularized discriminant
analysis (Section 7). Both gave five errors on the test
set. However, neither method gave a simple picture of
the data.
Next we show a generalization of the nearest shrun-
ken centroid approach that facilitates the discovery of
potentially important subclasses. It may be valuable bi-
ologically to look for distinct subclasses of diseases
in microarray analyses. We can generalize the nearest
shrunken centroid procedure to facilitate the discov- F IG . 11. NCI cancer cell lines: training, cross-validation and test
ery of subclasses. Consider the problem illustrated in error curves.
CLASS PREDICTION BY NEAREST SHRUNKEN CENTROIDS 113
TABLE 2
NCI subclass results: test errors (out of 21 samples) for nearest shrunken centroid model with no subclasses (second
column from left) and two subclasses per class (third column from left); the columns on the right show the resulting
number of errors when a pair of subclasses for a given class is fused into one subclass
6830 5 6 6 6 6 6 6 7 5 8
6827 5 6 5 5 7 6 6 7 6 6
6122 5 5 6 5 5 5 5 5 5 5
3571 7 6 8 7 6 6 6 6 7 6
1695 9 6 8 7 6 6 6 6 7 6
696 9 7 9 6 7 7 7 7 8 8
293 9 6 8 7 7 6 6 7 7 6
119 10 6 8 8 8 6 8 7 7 8
42 10 12 13 14 14 12 12 12 12 12
17 14 14 14 14 16 14 14 14 14 13
SVM); while these may give low prediction error, they shrunken centroids to this r · K class problem. If the
may not reveal the biologically important subclasses predicted class from this large problem is h, then
that are present. our final predicted class is the class k that contains
For any class, our idea is to apply r-means clusters subclass h.
to the samples in that class, resulting in r subclasses With typical sample sizes, the choice r = 2 will
for that class. Doing this for each of the K classes be most reasonable. Table 2 shows the results on
results in a total of K · r subclasses. We apply nearest the National Cancer Institute (NCI) data. Without
subclasses, the test error rates start to rise when fewer
than 2,000 or 3,000 genes are used. Using subclasses,
we achieve about the same error rate with as few as 119
genes. The right part of the table shows that for 119 the
subclasses are most important for BRE, CNS, COL,
MEL and REN. The 119 gene solution is displayed in
Figure 13 and shows some distinct subclasses among
some of the main classes.
6. CAPTURING HETEROGENEITY
In discriminating an “abnormal” from a “normal”
group, the average gene expression may not differ
between the groups. However, the variability in expres-
sion may be greater in the abnormal group, due to het-
erogeneity in the abnormal population. This is illus-
trated in Figure 14. Nearest centroid classification will
not work in this case, since the class centroids are not
separated. The subclass method of the previous section
might help: we propose an alternative approach here.
We define new features xij = |xij − m̄i |, where m̄i
is the mean expression for gene i in the normal group.
Then we apply nearest shrunken centroids to the new
features xij .
F IG . 12. Two class problem with distinct subclasses. Numbers To illustrate this, we generated the expression of
indicate the average gene expression. 1,000 genes in 40 samples—20 from a normal group
114 TIBSHIRANI, HASTIE, NARASIMHAN AND CHU
F IG . 13. NCI subclass results. Shown are pairs of centroids for each class for the genes that survived the thresholding.
and 20 from an abnormal group. All expression values Nearest centroid shrinkage on the transformed features
were generated independently as standard Gaussian xij showed a test error rate of near zero, with 150 or
except for the first 200 genes in the abnormal group, more nonzero genes. Figure 15 compares the results
which had mean zero, but standard deviation 2. An of nearest shrunken centroids on the raw expression
independent test set of size 200 was also generated. values xij and the transformed expression values xij .
Nearest centroid shrinkage on the raw values does
poorly with an error rate greater than 40%, while use
of the transformed values reduces the error rate to near
zero.
By transforming to the distance from the normal
centroid, the use of the features xij might also provide
discrimination in situations where the abnormal class
is not heterogeneous, but is instead mean-shifted.
The right panel of Figure 15 investigates this. The
expression of the first 200 genes in the abnormal class
has mean 0.5 and standard deviation 1 (versus 0 and 1
for the normal class). Now nearest shrunken centroids
on the raw features is much more powerful, while use
of the transformed features works poorly. We conclude
that use of neither the raw nor transformed features
dominates the other, and both should be tried on a given
problem.
We have successfully used the heterogeneity model
F IG . 14. Illustration of heterogeneity in gene expression. Abnor- to predict toxicity from radiation sensitivity using
mal group A has the same average gene expression as the normal transcriptional responses to DNA damage in lymphoid
group N, but shows larger variability. cells (Rieger et al., 2003).
CLASS PREDICTION BY NEAREST SHRUNKEN CENTROIDS 115
F IG . 15. Left: Test error for data simulated from the heterogeneous two-class problem, using nearest shrunken centroids on raw expression
values (red) and transformed expression values |xij − m̄i | (blue). Right: Same as in left panel, but data are simulated from the mean-shifted
homogeneous two-class problem.
7. RELATIONSHIP TO OTHER APPROACHES which is linear in xi∗ . Because of the sign change, our
The discriminant scores (6) are similar to those used rule classifies to the largest δ̃k (x ∗ ). Likewise the LDA
in linear discriminant analysis (LDA), which arise from discriminant scores have the equivalent linear form
using the Mahalanobis metric to compute the distance (13) δ̃kLDA (x ∗ ) = x ∗ T W −1 x̄k − 12 x̄kT W −1 x̄k +log πk .
to centroids:
Regularized discriminant analysis (RDA; Friedman,
(11) δkLDA (x ∗ ) = (x ∗ − x̄k )T W −1 (x ∗ − x̄k ) − 2 log πk . 1989) leaves the centroids alone and modifies the
Here we use vector notation and W is the pooled covariance matrix in a different way,
within-class covariance matrix. With thousands of (14) δkRDA (x ∗ ) = (x ∗ − x̄k )T (W + λI )−1 (x ∗ − x̄k ),
genes and tens of samples (p n), W is huge and
where λ is a parameter (like our ). The fattened
any sample estimate will be singular (and hence its
W + λI is nonsingular, and as λ gets large, this proce-
inverse is undefined). Our scores can be seen to be a
dure approaches the nearest centroid procedure (with
heavily restricted form of LDA, necessary to cope with
no variance scaling or centroid shrinking). A slightly
the large number of variables (genes). The differences
modified version uses W +λD, where D = diag(s12 , s22 ,
are the following:
. . . , sp2 ). As λ gets large, this approaches the variance
• We assume a diagonal within-class covariance ma- weighted nearest centroid procedure. In practice, we
trix for W ; without this, LDA would be ill-condition- normalize this regularized covariance by dividing by
ed and fail. 1 + λ, leading to the convex combination (1 − α)W +
• We use shrunken centroids rather than centroids as a αD, where α = λ/(1 + λ). Although the relative dis-
prototype for each class. tances do not change, this is important when making
• As the shrinkage parameter increases, an increas- the adjustment for the class priors.
ing number of genes will have all their dik = 0,
Although RDA shows some promise, it is more com-
k = 1, . . . , K, due to the soft thresholding in (5). plicated than our nearest shrunken centroid procedure.
Such genes contribute no discriminatory information Furthermore, in the process of its regularization, it does
in (6), and in fact cancel in Equation (8). not select a subset of genes as the shrunken centroid
Both our scores (6) and the LDA scores (11) are procedure does. We are considering other hybrid ap-
linear in xi∗ . If we expand the square in (6), discard proaches of RDA and nearest centroids in ongoing re-
search projects.
the terms involving xi∗ 2 (since they are independent of
the class index k and hence do not contribute toward 8. NEAREST CENTROID CLASSIFIER
class discrimination) and multiply by −1/2, we get VERSUS LDA
x ∗ x̄ 2
p
p
1 x̄ik
(12) δ̃k (x ∗ ) = i ik
− + log πk , As discussed in the previous section, the nearest cen-
i=1 si2 2 i=1 si2 troid classifier is equivalent to Fisher’s linear discrimi-
116 TIBSHIRANI, HASTIE, NARASIMHAN AND CHU
F IG . 16. Simulation results: bias and variance (top panels) and mean-squared error and misclassification error (bottom panels) for linear
discriminant analysis and the nearest centroid classifier. Details of the simulation are given in the text. The nearest centroid classifier
outperforms LDA because of its smaller variance.
nant analysis if we restrict the within-class covariance multivariate manner, and hence will tend to have higher
matrix to be diagonal. When is this restriction a good variance. What is the resulting bias–variance tradeoff
one? and how does it translate into misclassification error?
Consider a two class microarray problem with p We did an experiment with p = 30 and n = 40, with
genes and n samples. For simplicity we consider the 20 samples in each of two classes. We set the ij th ele-
standard (unshrunken) nearest centroid classifier and ment of & to ρ |i−j | , where ρ was varied from 0 to 0.8.
standard (full within covariance) LDA. The recent the- Each of the components of the mean vector µ was set
sis of Levina (2002) did some theoretical comparisons to ±1 at random: such a mixed vector is needed to give
of these methods. She assumed p → ∞, n → ∞ and full LDA a potential advantage over LDA with a di-
p/n → γ ∈ (0, 1), and analyzed the worst case error agonal covariance. For each simulation, an indepen-
of each method. The relative performance of the two
dent test set of size 500 was also generated. The re-
methods depends on the correlation structure of the
sults of 100 simulations from this model are shown in
features (samples). Her results show that if p is a large
Figure 16. Bias, variance and mean-squared error refer
fraction of n, for a large class of correlation structures,
nearest centroid classification outperforms full LDA. to estimation of & −1 µ. For small correlations, the un-
Now in our problem, usually we have p n: in that derlying (diagonal covariance) model for nearest cen-
case, LDA is not even defined without some regular- troids is approximately correct and the method wins;
ization. Hence to proceed we assume that p is a little LDA shows a small improvement in bias for larger
less than n and hope that what we learn will extend correlations, but this is more than offset by the in-
to the case p > n. Let xj be a p-vector of gene ex- creased variance. Overall the nearest centroid method
pression values in class j . Suppose x1 ∼ N (0, &) and has lower mean-squared error and test misclassification
x2 ∼ N (µ, &), where & is a full (nondiagonal) matrix. error in all cases.
Then LDA uses the maximum likelihood unbiased esti- Now for real microarray problems, p n, and both
mate of & −1 µ, while nearest centroid uses a biased es- LDA and nearest centroid methods can be improved
timate. However, the LDA method estimates & −1 µ in a by appropriate regularization or shrinkage. We have
CLASS PREDICTION BY NEAREST SHRUNKEN CENTROIDS 117
not included regularization in the above comparison, D ONOHO, D. and J OHNSTONE , I. (1994). Ideal spatial adaptation
but the above results suggest that the bias–variance by wavelet shrinkage. Biometrika 81 425–455.
tradeoff will cause the nearest centroid method to E ISEN, M. B., S PELLMAN, P. T., B ROWN, P. O. and B OTSTEIN, D.
outperform full LDA. (1998). Cluster analysis and display of genome-wide expres-
sion patterns. Proc. Natl. Acad. Sci. U.S.A. 95 14 863–14 868.
F RIEDMAN, J. (1989). Regularized discriminant analysis. J. Amer.
9. DISCUSSION
Statist. Assoc. 84 165–175.
The nearest shrunken centroid classifier is poten- G OLUB, T. R., S LONIM, D. K., TAMAYO, P., H UARD, C.,
tially useful in any high-dimensional classification G AASENBEEK, M., M ESIROV, J. P., C OLLER, H., L OH, M.,
problem. In addition to its application to gene expres- D OWNING , J. R., C ALIGIURI , M. A., B LOOMFIELD , C. D.
and L ANDER, E. S. (1999). Molecular classification of cancer:
sion arrays, it could also be applied to other kinds of Class discovery and class prediction by gene expression mon-
emerging genomic data, including mass spectroscopy itoring. Science 286 531–537.
for protein measurements, tissue arrays and single nu- H ASTIE , T., T IBSHIRANI , R., B OTSTEIN , D. and B ROWN, P.
cleotide polymorphism arrays. (2001). Supervised harvesting of expression trees. Genome
Our proposal can also be applied in conjunction Biology 2 (1) research/0003.
with unsupervised methods. For example, it is now H EDENFALK , I., D UGGAN , D., C HEN , Y., R ADMACHER , M.,
standard to use hierarchical clustering methods on B ITTNER , M., S IMON , R., M ELTZER , P., G USTER -
SON , B., E STELLER , M., R AFFELD , M., YAKHINI , Z.,
expression arrays to discover clusters in the samples
B EN -D OR , A., D OUGHERTY, E., KONONEN , J., B UBEN -
(Eisen, Spellman, Brown and Botstein, 1998). The
DORF, L., F EHRLE , W., P ITTALUGA , S., G RUVBERGER , S.,
methods described here can identify subsets of the L OMAN , N., J OHANNSSON , O., O LSSON , H., W ILFOND , B.,
genes that succinctly characterize each cluster. S AUTER , G., K ALLIONIEMI , O., B ORG , A. and T RENT , J.
Finally, we touch on computational issues. The (2001). Gene-expression profiles in hereditary breast cancer.
computations involved in the nearest shrunken centroid New England Journal Medicine 344 539–548.
method are straightforward. One important detail: in K HAN , J., W EI , J., R INGNER , M., S AAL , L., L ADANYI , M.,
the denominator of the statistics dik in Equation (1) W ESTERMANN , F., B ERTHOLD , F., S CHWAB , M., A N -
TONESCU , C., P ETERSON , C. and M ELTZER , P. (2001). Clas-
we add the same positive constant s0 to each of the
sification and diagnostic prediction of cancers using gene ex-
si values. This guards against the possibility of large
pression profiling and artificial neural networks. Nature Medi-
dik values arising by chance from genes at very low cine 7 673–679.
expression levels. We set s0 equal to the median value L EVINA, E. (2002). Statistical issues in texture analysis. Ph.D.
of the si over the set of genes. A similar strategy was dissertation, Dept. Statistics, Univ. California, Berkeley.
used in the significance analysis of microarrays (SAM) R AMASWAMY, S., TAMAYO , P., R IFKIN , R., M UKHERJEE , S.,
methodology of Tusher, Tibshirani and Chu (2001). Y EANG , C., A NGELO , M., L ADD , C., R EICH , M., L AT-
We have developed a package in the Excel and R ULIPPE , E., M ESIROV, J., P OGGIO , T., G ERALD , W.,
language called prediction analysis for microarrays. L ODA , M., L ANDER , E. and G OLUB, T. (2001). Multiclass
cancer diagnosis using tumor gene expression signatures. Proc.
It implements all of the nearest shrunken centroids
Natl. Acad. Sci. U.S.A. 98 15 149–15 154.
methodology discussed in this article and is available at
R IEGER , K., H ONG , W., T USHER , V., TANG , J., T IBSHIRANI , R.
the website http://www-stat.stanford.edu/∼tibs/PAM. and C HU , G. (2003). Toxicity of radiation therapy associated
with abnormal transcriptional responses to DNA damage.
Submitted.
REFERENCES
ROSS , D., S CHERF, U., E ISEN , M., P EROU , C., R EES , C.,
A LIZADEH , A. A., E ISEN , M. B., DAVIS , R. E., M A , C., L OS - S PELLMAN , P., I YER , V., J EFFERY, S., VAN DE R IJN , M.,
SOS , I. S., ROSENWALD , A., B OLDRICK , J. C., S ABET, H., WALTHAM , M., P ERGAMENSCHIKOV, A., L EE , J., L ASH -
T RAN , T., Y U , X., P OWELL , J. I., YANG , L., M ARTI , KARI , D., S HALON , D., M YERS , T., W EINSTEIN , J., B OT-
G. E., M OORE , T., H UDSON , J R ., J., L U , L., L EWIS , STEIN , D. and B ROWN , P. (2000). Systematic variation in gene
D. B., T IBSHIRANI , R., S HERLOCK , G., C HAN , W. C., expression patterns in human cancer cell lines. Nature Genet-
G REINER , T. C., W EISENBURGER , D. D., A RMITAGE , J. O., ics 24 227–235.
WARNKE , R., L EVY, R., W ILSON , W., G REVER , M. R.,
T IBSHIRANI , R., H ASTIE , T., NARASIMHAN , B., and C HU,
B YRD , J. C., B OTSTEIN , D., B ROWN , P. O. and S TAUDT,
G. (2002). Diagnosis of multiple cancer types by shrunken
L. M. (2000). Distinct types of diffuse large B-cell lymphoma
identified by gene expression profiling. Nature 403 503– centroids of gene expression. Proc. Natl. Acad. Sci. U.S.A. 99
511. 6567–6572.
A MBROISE , C. and M C L ACHLAN, G. (2002). Selection bias in T USHER , V. G., T IBSHIRANI , R. and C HU, G. (2001). Signifi-
gene extraction on the basis of microarray gene-expression cance analysis of microarrays applied to the ionizing radiation
data. Proc. Natl. Acad. Sci. U.S.A. 99 6562–6566. response. Proc. Natl. Acad. Sci. U.S.A. 98 5116–5121.