Bbaa 007

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Briefings in Bioinformatics, 22(1), 2021, 334–345

doi: 10.1093/bib/bbaa007
Advance Access Publication Date: 7 February 2020
Review Article

High-dimensional variable selection for ordinal


outcomes with error control
Han Fu and Kellie J. Archer
Corresponding author: Kellie J. Archer, The Ohio State University, Columbus, OH 43210, USA. Tel.: +1-614-247-6167; Fax: +1-614-247-1846;
E-mail: [email protected]

Abstract
Many high-throughput genomic applications involve a large set of potential covariates and a response which is frequently
measured on an ordinal scale, and it is crucial to identify which variables are truly associated with the response. Effectively
controlling the false discovery rate (FDR) without sacrificing power has been a major challenge in variable selection
research. This study reviews two existing variable selection frameworks, model-X knockoffs and a modified version of
reference distribution variable selection (RDVS), both of which utilize artificial variables as benchmarks for decision making.
Model-X knockoffs constructs a ‘knockoff’ variable for each covariate to mimic the covariance structure, while RDVS
generates only one null variable and forms a reference distribution by performing multiple runs of model fitting. Herein, we
describe how different importance measures for ordinal responses can be constructed that fit into these two selection
frameworks, using either penalized regression or machine learning techniques. We compared these measures in terms of
the FDR and power using simulated data. Moreover, we applied these two frameworks to high-throughput methylation data
for identifying features associated with the progression from normal liver tissue to hepatocellular carcinoma to further
compare and contrast their performances.

Key words: false discovery rate; ordinal regression; knockoff filter; L1 penalization; boosting; ordinal forests

the outcome. Thus, effective variable selection procedures for


Introduction
high-dimensional data are needed to select a set of candidate
Ordinal measurement scales are commonly employed in the variables for further confirmatory investigation.
medical literature [1]. Health status and patient outcomes are Various high-dimensional variable selection or screening
often evaluated on an ordinal scale [2]. For example, in a hepato- methods have been proposed and widely used during the past
cellular carcinoma (HCC) study, liver tissue samples can be clas- decades. Examples include the least-absolute shrinkage and
sified into one of three ordinal class categories including normal, selection operator (LASSO) [4], boosting [5] and sure indepen-
cirrhotic without concomitant cancer and cirrhotic with HCC dence screening [6]. These methods focus mainly on providing
[3]. In order to better understand disease mechanisms, some a small set of potentially relevant variables, but not necessarily
genomic applications have been conducted to identify important the ‘smallest’. In other words, the set provided by these methods
genomic features whose characteristics, such as methylation or is very likely to contain redundant variables or noise variables,
expression levels, are associated with a certain clinical response. which are also known as false positives. Controlling the false
Such applications often possess two properties, high dimen- discovery rate (FDR), the expected proportion of false positives
sionality and sparsity, which means that a massive number of among all discoveries, was initially suggested by Benjamini
features (far more than the sample size) are collected but we and Hochberg [7] and has quickly established itself in the
suspect only a small portion of them are truly associated with statistics literature. As the canonical approach to FDR control,

Han Fu is a PhD student in biostatistics at the Ohio State University. Her main research interest lies in cancer genomics, high-dimensional data analysis,
and precision medicine.
Kellie J. Archer is a professor and the chair of the Division of Biostatistics at The Ohio State University. Her primary research area has been developing
statistical methods for analyzing high-throughput genomic data.
Submitted: 23 September 2019; Received (in revised form): 6 January 2020

© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]

334
High-dimensional variable selection for ordinal outcomes 335

the Benjamini–Hochberg (BH) procedure [7] requires a set of valid random vector of the predictors and X denote the n × p design
P-values, and it controls the FDR exactly when the P-values are matrix. We would like to identify the subset of predictors that
independent or have positive regression dependency on a subset are truly associated with Y while making as few false discoveries
[8]. A modified version, the Benjamini–Yekutieli (BY) procedure, as possible. Strictly speaking, our main objective is to identify
controls the FDR when the P-values have arbitrary forms of the ‘smallest’ subset S such that given covariates in the subset,
dependence [8], but it may suffer from power loss compared Y is conditionally independent of all other variables. The subset
with BH [9]. Both procedures assume that valid P-values can S is called the Markov blanket in the graphical model literature
be obtained, which may be difficult for high-dimensional [12]. The variables in the subset S, {Xj }j∈S , are truly associated
conditional modeling problems. Using marginal p-values from with the response and those not in S, {Xj }j∈S /
, are not associated
feature-by-feature hypothesis tests is a possible solution for with the response (null).
high-dimensional data, but power and interpretability are The model-X knockoff method tests conditional hypotheses
sacrificed to some extent in most cases [10]. that are mathematically equivalent to the objective stated above
The model-X knockoff approach [10] was recently developed under mild conditions [10] (see Section 2.3.1 for details), and
as a general variable selection framework with exact finite- thus addresses the problem of interest precisely. In contrast,
sample FDR control that can be applied to high-dimensional the RDVS framework tests hypotheses that are of a marginal
conditional testing problems. The framework generalizes the flavor because it compares each single statistic to an empirical
original knockoff procedure [9] designed for linear models to reference distribution separately. In this regard, the RDVS is
arbitrary nonlinear relationships between the response and the not adequately addressing the problem proposed. However, it
covariates, without relying on P-value calculation. Briefly, a set of pools information across predictors by utilizing statistics from
‘knockoff’ copies are constructed within the framework to mirror multivariable models, instead of looking at one variable at a
the structure of the original covariates while remaining indepen- time. We therefore want to examine its performance in solving
dent of the response. The knockoff variables can thus be used the conditional problem, i.e. identifying variables in the Markov
as negative controls for the real covariates, so that true signals blanket given the outcome is independent of others.
will be teased apart from noise variables to ensure FDR control.
Both Gaussian and binomial responses were considered in the
Model assessment
original article [10], and although no special attention was paid
to ordinal outcomes, the framework still applies. Another frame- FDR is a commonly used method to quantify false positive
work that shares similar underlying philosophies with knock- findings nowadays. This section presents the concept of the
offs, the reference distribution variable selection (RDVS) method FDR and power in the context of variable selection. Suppose
[11], generates one artificial null variable to form an empirical we have a selection method that identifies a subset of the p
null distribution for making inferences as in the hypothesis predictors, denoted by Ŝ, and claims that {Xj }j∈Ŝ are significant
testing setting. The RDVS was originally described as a Bayesian while {Xj }j∈/ Ŝ are null. Denote the number of truly null variables
screening approach using a Gaussian spatial process model, that are claimed to be significant (i.e. false positives or Type I
introduced in the field of computer experiments where a limited errors) by n01 , the number of truly null variables claimed to be
number of covariates are typically considered, but it naturally null by n00 , the number of truly associated variables claimed to
extends to a general selection framework that can accommodate be significant by n11 , and the number of truly associated variables
arbitrary outcome types and data dimensionality. As with the claimed to be null by n10 . In the language of sets, n00 = #{j :
c
model-X knockoff approach, the original RDVS method did not j ∈ (S ∪ Ŝ) }, n01 = #{j : j ∈ Ŝ\S}, n10 = #{j : j ∈ S\Ŝ} and
accommodate ordinal response data but nonetheless can be n11 = #{j : j ∈ (Ŝ ∩ S)}, where S is the Markov blanket. Then,
used in ordinal response modeling. the false discovery proportion (FDP) is defined as the proportion
In this article, we review the model-X knockoff and reference of false positives among all discoveries, i.e. n01 /(n01 + n11 ). The
distribution method frameworks and examine their perfor- FDR is the expected value of the FDP, and we can use the FDP
mances with respect to FDR error control for ordinal response to estimate the FDR. The power (or sensitivity), on the other
models in comparison to their corresponding traditional base hand, can be estimated by the proportion of true signals being
learners that do not apply a framework. Specifically, we describe identified, defined as n11 /(n11 + n10 ). In the ideal case, we would
multiple variable importance measures for ordinal outcomes discover all true signals without false positives, such that the
using various techniques, including L1 penalized regression, FDR is zero and power is one.
boosting and ordinal forests, to make both frameworks appli-
cable to ordinal response data. The remainder of the article Selection frameworks
is organized as follows. Section 2 reviews the two variable
Model-X knockoffs
selection frameworks as well as the importance measures that
can be used for ordinal outcomes. Section 2 also describes the The knockoff framework [10] estimates the covariate distribu-
simulation study designed to compare various base learners tion FX from the observations, which are assumed to be drawn
with and without using these frameworks. Results from both from some joint distribution of (X, Y). It tests the hypotheses
the simulation study and the case application are reported in of conditional independence, i.e. whether Xj is independent of
Section 3. Section 4 concludes the article with a discussion. Y conditional on all other variables. A false positive under this
framework is thus defined by selecting a variable Xj that is
independent of Y given the others. In the special case of gener-
Methods alized linear models, the hypotheses above are equivalent with
the traditional parametric hypotheses—whether the regression
Problem statement
coefficient is zero or not. Under weak regularity conditions, the
Consider the problem where we have n independent observa- set of relevant variables defined by the hypotheses is guaranteed
tions and p predictors (p > n) for an ordinal response Y, which to be unique, and coincides with the Markov blanket S described
takes on one of K ordinal levels. Let X = (X1 , . . . , Xp ) denote the in Section 2.1 according to the graphical model literature [13].
336 Fu and Archer

Specifically, the knockoff method [10] requires the construc- Reference distribution variable selection
∼ ∼ ∼
tion of knockoff variables X = (X1 , . . . , Xp ), which satisfy two RDVS was originally described as a Bayesian screening approach
properties: designed exclusively for Gaussian processes [11] but naturally
extends to a general selection framework, which can accommo-
(1) Pairwise exchangeability: for any subset B ⊂ {1, . . . , p},

d
∼ ∼ date arbitrary outcome types and feature importance measures.
(X, X)swap(B) ⇐⇒ (X, X), i.e. swapping the entries Xj and Xj The framework requires us to generate only one new ‘null’
for each j ∈ B leaves the joint distribution invariant; variable X† for n observations. Ideally, the null variable should be

(2) Conditional independence: X ⊥ Y | X. orthogonal to all variables in the design space, so that it will not
interfere with the inference of the original variables. In practice,
Given these two properties, knockoff variables are con-
however, it would be difficult to identify such a variable and one
structed to mimic the covariance structure of the original
may not exist. To address this, the RDVS method randomly sam-
covariates while remaining as different as possible from the
ples from the design space of X multiple times (M) after centering
corresponding covariate to boost the power. Since knockoff
and scaling all covariates, so that there are M realizations of
variables are conditionally independent of the response, they
the null variable, denoted by X†(1) , . . . , X†(M) . Thus, the potential
are able to serve as benchmarks to make controlled variable
correlation with original variables can be reduced through an
selection possible. In this article, knockoffs are constructed using
averaging effect. In other words, although only one null variable
the second-order approximate semidefinite program approach
needs to be generated each time, multiple replications of null
by Candès et al. [10] with the knockoff R package [14], assuming
variable generation and model fitting are required.
that the covariates follow a multivariate normal distribution. A
The specific procedure is described as follows. In the mth run
James-Stein-type shrinkage estimator for the covariance matrix
where a new X†(m) is sampled, for example, from a standard nor-
is used to produce a positive-definite and well-conditioned
mal distribution, m = 1, . . . , M, we have an augmented n × (p + 1)
matrix. All covariates are centered and scaled prior to the
design matrix [X, X†(m) ]. Again, we suppose that we have decided
analysis.
on a certain variable importance measure in terms of explaining
After generating the knockoff variables, the original design
the response Y and obtain the (p + 1) measurements accordingly,
matrix is then augmented with the knockoffs to form an
∼ denoted by Z(m) (m) (m)
1 , . . . , Zp+1 , where Zp+1 corresponds to X
†(m)
. For
extended n × 2p matrix [X, X]. Some variable importance example, we can fit Y against [X, X †(m)
] using some generalized
measures designed for ordinal responses, which will be linear model and use the absolute values of regression coeffi-
described in Section 2.4, can be used to assess the importance cients as the importance measurements Z(m) (m)
1 , . . . , Zp+1 . Repeating
of the 2p features in terms of explaining the ordinal outcome this step M times, we have M observations of the feature impor-
Y. The importance measures, Zj , may be based on a fitted- tance measurements for every single variable including the null
regression model, such as the absolute value of the regression variable X† . For the p original variables, the sample mean of M
coefficients, or using side-effects that result from machine measurements,
learning techniques, such as risk reduction in boosting. Since
only the general selection frameworks are described in this
1  (m)
M
section, we suppose, for now, that we have decided on a certain  
Wj = Z , j ∈ 1, . . . , p (2)
variable importance measure and obtain the 2p measurements M m=1 j
accordingly, denoted by Z1 , . . . , Z2p . Then, the statistic for Xj
used for variable selection, for j ∈ {1, . . . , p}, is calculated as the
can be used as a summary statistic reflecting relative variable
difference in importance measurements between the original
importance. The M measurements for X† , i.e.Z(1) (M)
p+1 , . . . , Zp+1 , are
variable and its knockoff variable, denoted by Wj = Zj −Zj+p . Other
combined to form a reference distribution for the null variable,
statistics are acceptable if they attain the flip-sign property
against which the relative variable importance can be assessed.
required by model-X knockoffs [10], which says that swapping
The RDVS method then uses this empirical distribution as a
a variable with its knockoff changes the sign of the statistic. A
null distribution and borrows ideas from one-sided hypothesis
large positive value typically provides evidence of an important
testing. Assuming a larger statistic indicates a more important
variable, while the statistics for null variables tend to have
variable, we can calculate empirical p-values using the propor-
symmetric distributions around zero with small magnitudes. In
tion of times the samples in the reference distribution exceed
this article, only the differences in importance measurements
the observed statistic Wj , given by
are considered.
After computing importance difference statistics Wj for j ∈
   
{1, . . . , p}, the following procedure guarantees strict FDR control pvalj = # m : Z(m)
p+1 > Wj /M, j ∈ 1, . . . , p (3)
according to [10]. A data-dependent threshold is first given by

  Considering the potentially strong dependency among the


 
1 + # j : Wj ≤ −t resulting empirical P-values, the BY procedure [8] can be applied
τ = min t > 0 :   ≤q (1)
# j : Wj ≥ t to adjust the P-values for approximate FDR control. The variables
with values smaller than the target FDR level q will be selected.
In the spirit of one-sided hypothesis testing, the RDVS frame-
where q denotes the target FDR level. Then, the variables whose work is roughly testing whether the statistic Wj for the variable j
statistic exceeds the threshold will be selected. Since the model- comes from the same distribution as that of the artificial variable
X knockoff framework places no restriction on the data dimen- X† . However, the hypotheses proposed may not be valid under
sionality or the conditional distribution of Y | X, the framework certain circumstances where the covariates are not independent
is known in theory to apply seamlessly to high-dimensional of one another. The variable X† is known to be independent of
ordinal response data with FDR control, as long as appropriate both X = (X1 , . . . , Xp ) and Y due to the way it is created; while any
importance measures are available. truly null variable(s) in X are conditionally independent of Y, they
High-dimensional variable selection for ordinal outcomes 337

Table 1. The similarity and difference between the procedure of the knockoff framework and that of the RDVS framework

Knockoffs RDVS


Augmentation X∗ = [X, X](n × 2p) X∗ = [X, X† ](n × (p + 1))
(1) (1) (M) (M)
Importance measurements Z1 , . . . , Z2p Z1 , . . . , Zp+1 . . . Z1 , . . . , Zp+1
1 M (m)
Statistics Wj = Zj − Zj+p Wj = M m=1 Zj
1+#{j:Wj ≤−t}
Selection Wj > min t > 0 : ≤q Benjamini–Yekutieli adjusted pvalj < q,
#{j:Wj ≥t}
where pvalj = empirical P-values from
 
(1) (M)
comparing Wj against Zp+1 , . . . , Zp+1

may be dependent on some significant variables in the Markov The model is given by
blanket S. In this case, the truly null variables will tend to have
higher probabilities of being selected than the artificial X† , and  

p

hence the empirical distribution of the importance measure for argmax L β|y, x − λ |βl | (5)
β
X† cannot well represent the null distribution anymore. In con- l=1

trast, the knockoff framework preserves the covariance structure


of the original variables, which makes it more suitable for the where L(β|y, x) is the likelihood function for a regular CR model
case of dependent covariates. A recent article [15] explained and λ is the regularization parameter. In practice, the penalized
and empirically demonstrated the advantage of the knockoffs in CR model was fitted by augmenting the dataset then applying
providing the correct reference distribution over other augmen- techniques for solving the logistic LASSO with the glmnetcr R
tation methods including adding independent dummy variables package.
as in the RDVS. Therefore, there are two natural approaches for obtaining
As a summary, Table 1 presents the similarity and differ- variable importance measures based on the L1 penalized CR
ence between the procedures of the two frameworks. They have model. One uses the absolute value of coefficients resulting from
similar procedures in general which makes them comparable, fitting the response Y against the design matrix X∗ . The regu-
yet there are disparities between the two frameworks in each larization parameter λ can be selected from a grid of values by
specific step. minimizing either the AIC or BIC. Then, the importance measure
for Xj can be expressed by

Variable importance measures  


 
ZAIC
j = β̂jAIC  (6)
Several variable importance measures designed for ordinal out-
comes are described in this section. The design matrix used in
or
this section, denoted by X∗ , represents the n × 2p augmented
 
∼  
design matrix [X, X] for the knockoff framework, the n × (p + 1) ZBIC
j = β̂jBIC  (7)
augmented design matrix [X, X† ] for the RDVS and the original
(centered and scaled) design matrix X if no selection frameworks where β̂jAIC and β̂jBIC are the estimated coefficients for Xj from the
were applied. The following measures described can be used final model tuned by AIC or BIC, respectively, for j ∈ {1, . . . , p∗ }
to assess the importance of each variable in X∗ in fitting or (p∗ is the number of columns in X∗ and takes the value of p,
predicting the response Y. p + 1, or 2p depending on whether a traditional, RDVS, or model-
X knockoff framework is used). A larger ZAIC j
or ZBIC
j
indicates
Measures based on continuation ratio models with L1 penalization a more important feature. The regularization parameter λ can
be also tuned by cross-validation (CV). Specifically, we choose
The continuation ratio (CR) model is a type of generalized linear
λ that leads to the smallest 5-fold cross-validated classification
model for ordinal outcomes. The link function is the logit of a
error from a reasonable grid of 100 values. The coefficients
conditional probability, formulated as below
corresponding to the optimal λ tuned by CV can be thus used
  to measure importance, expressed by
P Y = k|Y ≤ k, X∗ = x
logit P Y = k|Y ≤ k, X∗ = x = log ∗  
P Y < k|Y ≤ k, X = x  cv 
j = β̂j 
Zcv (8)
= αk + β x T
(4)
for j ∈ {1, . . . , p∗ }.
where k ∈ {2, . . . , K}. The CR model can be fitted by augmenting Another variable importance measure can be based on the
the dataset to represent K − 1 conditionally independent binary solution path, i.e. the penalty at which variable j first enters the
response datasets, and thus standard techniques for model- model. The formal definition is
ing binary data can be applied. In addition, the CR model is
 
frequently used when progression through the ordinal levels path
Zj = sup λ : β̂j (λ) = 0 , (9)
cannot be reversed, such as stage of cancer [16].
For the purpose of variable selection, Archer and Williams
path
[2] proposed the L1 penalized CR model, which added the L1 for j ∈ {1, . . . , p∗ }. A larger Zj means that the variable Xj enters
penalization to the likelihood for the CR model, as in the LASSO. the model earlier and thus is more important.
338 Fu and Archer

Risk reduction in boosting variables [24]. The elastic-net penalty can be expressed below
[25]
Boosting is a machine learning ensemble method, which
 
converts weak learners to strong learners to increase predictive 1 
p

p

λ (1 − α) βl2 + α |βl | , (11)


ability and reduce over-fitting. Model-based boosting can use 2
l=1 l=1
component-wise least-squares estimates as base-learners to
fit generalized linear models, such as the cumulative logit
where 0 ≤ α ≤ 1 is a mixing parameter, and α = 0 corresponds
(CL) model, which is commonly used for ordinal data. The CL
to the ridge penalty and α = 1 the LASSO penalty. We explored
model links the logit of cumulative probabilities to covariates,
the performance of the RDVS with the value of α varying from
formulated as
0 to 1. It turns out that the RDVS framework with the AIC-
  or BIC-based importance measure performs equally well for all
P Y ≤ k|X∗ = x
logit P Y ≤ k|X∗ = x = log = αk + β T x (10) different values of α except for α = 0, i.e. the pure ridge case.
P Y > k|X∗ = x
The ridge penalty (α = 0) leads to a higher FDR than other α
values since it does not produce sparse solutions. We therefore
maintain use of the L1 -norm penalty (α = 1) in this article for
where k ∈ {1, . . . , K − 1}. ease of interpretation.
We use the mboost R package [17] to implement the boosting We also applied a Bayesian procedure to fit CL models and
method for ordinal responses. The gradient boosting algorithm assess variable importance by regression coefficients. The out-
was applied [5] to the CL models using the default component- come Y is set to have a multinomial distribution, and the link
wise smooth P-spline base-learners [18]. The high flexibility function is given in Equation (10). For the purpose of variable
allows us to fit more complex relationships between Y and X∗ selection, Laplace (or double-exponential) priors are used for
than that specified by a regular CL model. There is a unique regression parameters βj , j ∈ {1, . . . , p∗ }, as commonly seen in
base-learner for each variable in X∗ since no interaction term Bayesian LASSO models [4]. Specifically, the following priors are
is specified. The 5-fold cross-validated estimation of the empir- specified for model parameters: α0k ∼ N(0, σ 2 ), where α01 < α02 <
ical risk was used to select an optimal number of boosting · · · < α0,K−1 , βj ∼ Double exponential (0, μ), j ∈ {1, . . . , p∗ }, μ ∼
iterations. The variable importance can be measured by the Gamma (a, rate = b), where (α01 , α02 , . . . , α0,K−1 ) and (β1 , β2 , . . . , βp∗ )
accumulated in-bag risk reductions per boosting step for each comprise the vectors αk and β in Equation (10), respectively. The
base-learner or variable, provided by the varimp function in the absolute values of posterior mean estimates for βj can be used
mboost package. We denote the measurements by Zmboost j
for j ∈ to measure variable importance, i.e.
{1, . . . , p∗ }. The larger the risk reduction Zmboost
j , the more impor-
tant the variable Xj appears to be in predicting Y. The boosting
measure has been shown to perform well within the knockoff  
= β j 
Bayesian
Zj (12)
framework when the outcome is continuous, binary or time to
event [19].

where β j is the posterior mean, for j ∈ {1, . . . , p∗ }. Again, a larger


Ordinal random forests Bayesian
Zj means that Xj is more important. Due to the computa-
The random forest (RF) methodology [20] is another ensemble tional inefficiency and convergence issues of MCMC especially
learning method for classification and regression. As an exten- when p∗ is large, the results for the Bayesian measure are not
sion of bagging classification trees, it has the characteristics of presented in this article. However, a small-scale experiment with
random feature selection at each node, so that resulting trees p = 500 showed that the Bayesian measure together with knock-
are uncorrelated, which improves prediction accuracy. The RF offs could achieve decent power with FDR properly controlled.
method has been recently generalized to incorporate the order-
ing information for ordinal responses, using either conditional
inference trees [21] or classical regression trees [22]. The latter Marginal testing
method ‘ordinal forests’ [22] provides a variable importance
measure for high-dimensional data, which can be incorporated Practitioners frequently resort to marginal screening when deal-
into the frameworks above. Similar to the ordered probit regres- ing with high-dimensional data. As a baseline for comparing
sion model, the ordinal forest method assumes a latent con- the performance of multivariable methods and augmentation
tinuous variable underlying the ordered response. The method frameworks for variable selection, we applied several univariate
[22] uses the out-of-bag misclassification error estimates to testing approaches which produce marginal p-values for ordinal
construct the forest and to compute the variable importance responses. In other words, the P variables are tested one-by-one
measurements, which can be obtained using the ordfor function for unconditional associations with Y. The resulting marginal P-
in the ordinalForest R package [23]. We denote the measurements values are then used to control the FDR using the BH procedure
by ZordinalForest for j ∈ {1, . . . , p∗ }. An important variable tends to [7], where variables having smaller BH-adjusted P-values than
j
have a large positive value of ZordinalForest , while a null variable the pre-specified target FDR level q are selected. Specifically,
j
tends to have a measurement around zero. The method is not we examined three different marginal testing procedures. The
sparse by itself in the sense that most of the measurements are Jonckheere–Terpstra test, which tests the equality of population
close to but not exactly zero. medians among K groups against ordered alternatives [26], was
applied using a permutation version for estimating P-values for
each individual covariate using the jonckheere.test function in the
Other measures
clinfun R package [27]. Other marginal procedures include fitting
Elastic net regularization, which is a linear combination of the L1 univariate ordinal regression models including the CL model and
and L2 penalties, overcomes some limitations of the LASSO, such the CR model. Both models were fit using the vglm function in the
as selecting only one variable from a group of high-correlated VGAM R package [28].
High-dimensional variable selection for ordinal outcomes 339

Simulation study tissues were from hepatitis C virus (HCV) related HCC tumors
and their adjacent non-tumorous HCV-cirrhotic tissues, 16 were
A simulation study was performed to compare different selec-
independent HCV-cirrhotic tissues from patients without con-
tion frameworks as well as different importance measures. The
comitant HCC, and the other 20 tissues were normal liver tissues.
data-generating process is described below. The design matrix
The outcome can be considered as an ordinal variable with
X contained n vectors of observed covariates, each of which had
three levels (normal < cirrhosis non-HCC < HCC). We would
a p-dimensional Gaussian distribution MVN(0, ) and  had an
th like to fit the ordinal outcome using the CpG site methylation
autoregressive structure with the (i, j) element being ρ |i−j| . We
percentages as predictors to identify a set of genes associated
set the correlation parameter ρ to be either 0.2 or 0.5 to reflect
with the ordinal phenotype.
small to moderate correlation among predictors as we might
Prior to applying any methods, we removed the CpG
see in real genomic datasets. The sample size n was set to 300
sites with missing values or extreme values (mean percent
and the number of predictors p = 1000 to simulate the situation
methylation larger than 95% or smaller than 5%), leaving 1078
where p exceeds n. Among the p = 1000 variables, k = 50 vari-
CpG sites for the analyses. Then, we applied different selection
ables were randomly selected to be important and truly associ-
methods to the dataset, as we have done in the simulation study.
ated with the outcome Y. For these 50 variables, the correspond-
To be consistent with the simulations, we set the target FDR q
ing coefficient βj ’s had the signal amplitude A taking on random
to be 20%. The models were fit M = 100 times when using the
signs, i.e. βj ∈ {±A}. The amplitude represents the signal strength
RDVS framework. Genes associated with CpG sites selected by
and was varied from 0.2 to 1. The other 950 variables were sim-
different framework-measure combinations were recorded for
ply ‘null’ noise variables with the coefficients being zero. After
comparison.
obtaining the coefficient vector β = (β1 , β2 , . . . , βp )T , we sampled
The original study [3] used marginal one-sided Jonckheere–
a latent continuous response U through the formula below
Terpstra tests and the q-value method [30] to control the FDR
at 10%, and identified 235 CpG sites that had a significant
U = Xβ +  (13) increasing trend and 266 sites with a significant decreasing
trend in methylation proportion among tissues ranging from
normal, to HCV-cirrhosis, to HCV-HCC. To assess the consistency
where  followed a logistic (0, 1) distribution, so that U | X
with those previous findings for each method considered in
followed a logistic (Xβ, 1) distribution. Then, the ordinal response
this article, we calculated the proportion of times that the CpGs
Y with three ordinal levels was generated by categorizing U
selected by each method also appeared in their results. The
using its 33.3th and 66.7th percentiles. The simulation settings
significant list given by Archer et al. [3] does not necessarily
correspond to a CL model and the percentiles here correspond
represent the ground truth, but if it did, the calculated proportion
to the αk ’s in Equation (10). We set the target FDR q at 20%, i.e. we
could be used to assess our methods in terms of the true positive
wanted to control the FDR below or at least around 20%. For the
proportion (1− FDR).
RDVS framework, we repeated the model fitting M = 100 times.
Our goal was to compare base learners combined with one
of the two selection frameworks, model-X knockoffs and RDVS
described in Section 2.3, to base learners without using any Results
framework with respect to FDR and power. That is, the L1 penal- Simulation study
ized CR model and boosting introduced in Section 2.4 can be used
The trends of the FDR and power with varying signal ampli-
to perform variable selection without applying a framework. For
tude (A) are presented for base learners (where neither the
example, non-zero coefficient estimates from the L1 penalized
RDVS nor the knockoff framework applied), the BH procedure
CR model tuned by AIC (ZAIC j
), BIC (ZBIC
j
) or CV (Zcv
j
), and risk
with marginal P-values, the RDVS framework and the knockoff
reductions from boosting (Zmboost j
). These methods can be used
framework, in Figures 1–4, respectively. In these plots, different
to select variables since they are automatically sparse, i.e. only
lines correspond to different variable importance measures as
a small number of estimates are non-zero. We considered that
defined in Section 2.4. The top panels present the case of slightly
the variables having non-zero importance measurements were
correlated covariates (autocorrelation ρ = 0.2) and the bottom
‘selected’ by these methods and other variables were not.
present the moderately correlated case (ρ = 0.5). Each error bar
We then compared the FDR and power resulting from differ-
indicates one standard deviation of the corresponding quan-
ent frameworks and different importance measures. The FDR
tity over 30 replications. The x-axis—signal amplitude A—was
and power were plotted against the amplitude A to exhibit
slightly jittered to ensure the error bars for each method was
the growing trends over signal strength. For each fixed A, the
distinguishable.
values reported were averaged over 30 repeated runs. All simula-
Before we applied any selection frameworks, the FDR was
tions were performed on the Ohio Supercomputer Center cluster
relatively high as shown in Figure 1, especially for the AIC- and
owens [29].
CV-based selection methods. The corresponding power for these
methods was high as well, which indicates that they are quite
liberal. In contrast, the BIC-based method is fairly conservative
Real application with both FDR and power close to zero. The power increased
We applied the methods described to identify CpG sites whose as the signal strength increased since the selection methods
methylation levels were associated with the progression from can better distinguish between signals and noise variables with
normal liver tissue to HCC. We used a subset of a dataset, avail- a larger signal strength. As the covariate correlation increased
able from Gene Expression Omnibus under accession number from ρ = 0.2 to ρ = 0.5, the power was slightly lower but the
GSE18081, from a study of HCC where liver tissue samples were difference was inapparent.
assayed using the Illumina GoldenGate Methylation BeadArray For the marginal models adjusted by the BH procedure shown
Cancer Panel I [3]. The dataset includes the methylation per- in Figure 2, the Jonckheere–Terpstra test appeared to be less
centages of 1505 CpG sites from 56 liver tissues. Twenty liver powerful than the univariate CL and CR models. The FDR of the
340 Fu and Archer

Figure 1. FDR (left) and power (right) for L1 penalized CR models tuned by AIC, BIC or CV, and for boosting with ordinal responses (base learners). The design matrix has
an AR(1) covariance structure with autocorrelation 0.2 (top) and 0.5 (bottom). The error bars indicate ±one standard deviation over 30 replications. The signal amplitude
was jittered to make error bars distinguishable.

univariate CL and CR models was well controlled at the target When we applied the knockoff framework, we can see from
level of 20% for the slightly correlated datasets (ρ = 0.2) but not Figure 4 that the average FDR for all six measures has been well-
the moderately correlated cases (ρ = 0.5), which suggests that controlled at or below 20% in both correlation scenarios. The FDR
the FDR control of the BH procedure becomes unstable as the showed an increasing trend as the amplitude grew rather than
dependence gets stronger. The power still increased as the signal remaining stable at the nominal level as in the BH procedure,
amplitude became larger, but only a small level (around 20%) was which makes knockoffs preferable in the regime where the sig-
achieved in the end. nal strength is so low that we have little chance of identifying the
Compared to the base learners, the FDR for the AIC and true signals. The resulting error bars are shorter than those from
CV measures was reduced after using the RDVS framework as the other methods, indicating less variable FDP and more reliable
shown in Figure 3, but only the CV measure in the slightly corre- FDR control. The power increased as the amplitude increased,
lated scenario (ρ = 0.2) yielded strict FDR control at 20%. The AIC, ultimately achieving around 20% for some of the measures.
path-based and ordinal forest measures produced relatively high Some importance measures from the L1 penalized CR model (AIC
FDR, though the FDR had a decreasing trend as the amplitude and CV) yielded greater power than the others.
increased. The BIC and boosting measures that selected almost The selection methods for base learners in Figure 1 were
all variables, completely failed to control the FDR. Compared more computationally efficient compared to using either selec-
with the marginal methods in Figure 2, one reason for this failure tion framework, since no augmentation of the design matrix or
could be that the empirical P-values produced by these measures repeated model fitting was needed. Compared with the RDVS,
were not uniformly distributed under the null; the distribution the knockoff appeared to run faster for most of the importance
was concentrated near zero instead. Although the CV measure measures as the models were only fit once within the knockoff
in the case of ρ = 0.2 controlled the FDR and obtained relatively framework. This may not be true if the importance measure does
high power achieving around 40%, the resulting FDPs were highly not scale well to high-dimensional data, because the knockoff
variable as suggested by the long error bars. In this case, the uses n × 2p augmented matrices, while the RDVS uses n × (p +
FDR control is not very helpful since the realized FDP has an 1) matrices that can make a large difference if the original
unacceptably wide range. dimension p is high. Within a fixed framework, the measures
High-dimensional variable selection for ordinal outcomes 341

Figure 2. FDR (left) and power (right) for marginal tests with BH adjustment. The design matrix has an AR(1) covariance structure with autocorrelation 0.2 (top) and 0.5
(bottom). The error bars indicate ±one standard deviation over 30 replications. The signal amplitude was jittered to make error bars distinguishable.

from the L1 penalized CR models (AIC, BIC, CV and path) were to have the highest proportion of 100%, indicating all of the
computationally faster than the ordinal forest measure, which findings were identified in the previous study, followed by some
was faster than the boosting measure. Regarding the marginal RDVS methods and L1 penalized methods. The results from the
methods, univariate CL and CR models are slightly faster than marginal Jonckheere– Terpstra tests differed from the previous
the Jonckheere–Terpstra tests. findings because a more liberal FDR threshold of 20% (rather
than 10%) and two-sided tests were applied in this current
article. Overall, our results coincide well with the previous find-
Real application ings [3], and these discoveries may be reproducible in indepen-
Figure 5 demonstrates the selection results of the real dataset dent validation studies due to the error-controlling procedures
by different methods, where a cross mark indicates that the used.
gene (mapping to the selected CpG site) in the corresponding Other evidence has been found to support the roles of the
column was selected by the method in the row. The genes selected genes in liver cancer development. The most frequently
were sorted by the number of times they were selected. Some identified gene, endoplasmic reticulum to nucleus signaling 1
methods including the three marginal methods and some RDVS (ERN1), also known as IRE1, has been shown to play an important
methods were far too liberal and selected many more genes role in the ER stress-activated autophagy, which protects the
than the other methods. Due to the space limitation, only genes HCC cells from death [31]. The myeloperoxidase (MPO) gene is
selected by more than eight methods are presented. Genes that known to be associated with myeloperoxidase deficiency and
were reported in the original study as significant [3] appear in Alzheimer Disease, and a certain MPO genotype is found to
pink if their methylation proportion had an increasing trend increase the risk of HCC occurrence and liver-related death [32].
for progression to HCC and in blue if their methylation pro- DNA damage inducible transcript 3 (DDIT3), encodes a transcrip-
portion had a decreasing trend. The proportions of times that tion factor that regulates cell migration, proliferation, cell apop-
the genes selected by each method were also identified as sig- tosis and survival in many cancers [33–35]. Previous research
nificant in the original article [3] was calculated and listed to has suggested that differentially methylated DDIT3 may lead
the right of Figure 5. The methods were then sorted by this to liver carcinogenesis [36]. There is weak evidence supporting
proportion in descending order. The boosting-based method that glycosylphosphatidylinositol anchored molecule like (GML)
as well as the ‘knockoff + boosting’ combination turned out is differentially expressed comparing HCC with non-HCC
342 Fu and Archer

Figure 3. FDR (left) and power (right) using the RDVS framework with various importance measures. The design matrix has an AR(1) covariance structure with
autocorrelation 0.2 (top) and 0.5 (bottom). The error bars indicate ±one standard deviation over 30 replications. The signal amplitude was jittered to make error
bars distinguishable.

hepatitis/cirrhosis [37]. There is also evidence showing that measures based on the penalized CR model (AIC, path and CV)
some other commonly identified genes are relevant to HCC, that were relatively powerful and also computationally efficient
including CDKN2B (or P15) [38], HLA-DPA1 and IL8 [39], IL16 [40], are recommended for future research in ordinal response model-
TJP2 [41], SOX17 [42], PADI4 [43] and DLC-1 [44]. The DLC-1 gene ing. In the real application, some genes that mapped to selected
functions as a tumor suppressor gene in a number of common CpG sites have been identified to be potentially associated with
cancers including HCC. HCC, which have important implications for their roles in liver
cancer and may help to discover novel biomarkers and mecha-
nisms of HCC development.
Discussion and conclusion We followed the second-order approximation approach [10]
In this article, we reviewed two variable selection frameworks, to construct knockoff variables in this article. Nevertheless,
RDVS and knockoffs, together with several variable importance the approximation relies on a simplified assumption that the
measures designed for high-dimensional ordinal outcome data. covariate vector (each row of X) can be well-described by a p-
The two frameworks share a general idea of augmenting the dimensional multivariate normal distribution, which may not
data with artificial null variables to serve as benchmarks for hold in cases like methylation percentages for CpG sites. In
the purpose of variable selection. Both frameworks are model- addition, the approach is occasionally unstable for complex
free and applicable to many well-designed importance measures datasets and causes power loss for the subsequent selection
for various outcome types including ordinal data. Knockoffs are procedure. Novel approaches using deep generative models have
more mathematically rigorous in terms of basing the definitions been developed to relax the distributional assumption and gen-
of the FDR and hypotheses on conditional independence, and erate valid knockoffs in general settings [45–47]. The generative
thus guarantee exact finite-sample FDR control for the prob- adversarial networks have been adapted for knockoff generation
lem of interest, which was empirically verified by the present with four different and interacting neural networks involved [45].
article and previous literature where continuous or binary out- Liu and Zheng [46] proposed an alternative solution with less
comes were considered [9, 10, 19]. The RDVS framework, whose computational burden based on the variational autoencoder
hypotheses are of a marginal flavor, resulted in some inflation in [48]. In parallel, Romano et al. [47] introduced a machine that
FDR, except when the CV-based measure was used in a slight cor- moves beyond the second-order approximation and matches

related scenario. Among different importance measures, some higher-order moments of (X, X) through generative moment
High-dimensional variable selection for ordinal outcomes 343

Figure 4. FDR (left) and power (right) using the knockoff framework with various importance measures. The design matrix has an AR(1) covariance structure with
autocorrelation 0.2 (top) and 0.5 (bottom). The error bars indicate ±one standard deviation over 30 replications. The signal amplitude was jittered to make error bars
distinguishable.

Figure 5. Genes mapping to CpG sites identified to be important for HCC more than eight times using different methods, where a cross mark indicates that the gene in
the corresponding column was selected by the method in the row. The knockoff and ordinal forest combination did not select any variable and was thus omitted. Pink
columns indicate genes that were identified in the original study [3] as having increasing trends for progression to HCC, and the blue columns indicate genes having
decreasing trends. The numbers to the right represent the proportion of times that the discoveries were identified in the original results [3].
344 Fu and Archer

matching networks. These novel construction approaches Acknowledgements


could be applied in future research to yield more accurate
We would like to thank Jared Huling for helpful discussions
and powerful knockoffs for complex datasets such as high-
and useful comments about the methods considered in this
throughput methylation data.
Other relevant variable selection techniques include permu-
article. We would also like to thank the reviewers for valuable
tation methods and the conditional randomization test [10]. feedback and suggestions regarding an earlier version of this
A permutation method resamples the observed data points to article.
construct a sampling distribution of test statistics under the
null hypothesis. Given the empirical null distribution, FDR can
be estimated or controlled mostly in the marginal modeling Funding
regime [49, 50]. The ‘fixed-X’ knockoff article [9] empirically
Research reported in this publication was supported by the
compared the permutation method with the knockoff method
and showed that the permutation method lost FDR control
National Cancer Institute of the National Institutes of Health
when covariates displayed non-vanishing correlations. The under Award Number R03CA245771. The content is solely the
conditional randomization test [10] is somewhat related to responsibility of the authors and does not necessarily repre-
the RDVS framework, but instead of obtaining the sampling sent the official views of the National Institutes of Health.
distribution of statistics for a variable marginally independent
of both X and Y, it calculates the sampling distribution
for each variable under the null hypothesis of conditional
independence. Assuming the covariate distribution FX is known,
the empirical null distribution can be obtained by independently
References
sampling each variable conditional on all the other variables 1. Forrest M, Andersen B. Ordinal scale and statistics in medical
and recomputing the statistic with the sampled value. Then, research. Br Med J (Clin Res Ed) 1986;292(6519):537–8.
conditional randomization p-values can be calculated from the 2. Archer KJ, Williams AA. L1 penalized continuation ratio
empirical distribution. This method is superior to the RDVS models for ordinal response prediction using high-
framework in terms of allowing rigorous testing of conditional dimensional datasets. Stat Med 2012;31(14):1464–74.
hypothesis, but it requires one to know the covariate distribution 3. Archer KJ, Mas VR, Maluf DG, et al. High-throughput assess-
and to repetitively sample from the conditional distributions for ment of CpG site methylation for distinguishing between
all covariates which is computationally expensive. HCV-cirrhosis and HCV-associated hepatocellular carci-
One of the shortcomings of the article is that the power noma. Mol Genet Genomics 2010;283(4):341–9.
is greatly sacrificed for error control in the present simula- 4. Tibshirani R. Regression shrinkage and selection via the
tion study. Though the controlled selection problem for high- lasso. J R Stat Soc B Methodol 1996;58(1):267–88.
dimensional, sparse, correlated ordinal response data is diffi- 5. Bühlmann P, Hothorn T. Boosting algorithms: regularization,
cult by itself, designing powerful variable importance measures prediction and model fitting. Stat Sci 2007;22(4):477–505.
suitable for the problem would be helpful to boost the power. 6. Fan J, Lv J. Sure independence screening for ultrahigh dimen-
In addition, we did not apply one consistent ordinal regression sional feature space. J R Stat Soc Series B Stat Methodology
model but instead used multiple models including the CR and CL 2008;70(5):849–911.
models, making different measures less comparable. The simu- 7. Benjamini Y, Hochberg Y. Controlling the false discovery
lation design, which is essentially based on the CL model, might rate: a practical and powerful approach to multiple testing.
favor the CL-based importance measures (Zuni_cumulative
j
, Zmboost
j
J R Stat Soc B Methodol 1995;57(1):289–300.
Bayesian
and Zj ) over the others. However, no obvious advantage has 8. Benjamini Y, Yekutieli D. The control of the false discovery
been observed for these measures according to the simulation rate in multiple testing under dependency. The Annals of
results, which may ease the concern. Further efforts could be Statistics 2001;29(4):1165–88.
made to unify the choice of models, although the change in 9. Barber RF, Candès EJ. Controlling the false discovery rate via
results, if any, is expected to be negligible. knockoffs. The Annals of Statistics 2015;43(5):2055–85.
The R programs for producing the results in this article are 10. Candès EJ, Fan Y, Janson L, et al. Panning for gold: ‘model-
available at https://github.com/hanfu-bios/ordinalVS. X’ knockoffs for high dimensional controlled variable selec-
tion. J R Stat Soc Series B Stat Methodology 2018;80(3):551–77.
11. Linkletter C, Bingham D, Hengartner N, et al. Variable selec-
tion for Gaussian process models in computer experiments.
Key Points
Dent Tech 2006;48(4):478–90.
• Compare the performance of two existing variable 12. Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of
selection frameworks with traditional selection meth- Plausible Inference. Morgan Kaufmann Publishers, San Fran-
ods. cisco, CA. 1988.
• Describe multiple variable importance measures for 13. Edwards D. Introduction to Graphical Modelling. New York, NY:
ordinal outcomes to make the frameworks applicable Springer, 2000.
to ordinal response data. 14. Patterson E, Sesia M. Knockoff: The Knockoff Filter for Controlled
• Identify features associated with the progression from Variable Selection. R package version 0.3.2. https://CRAN.R-
normal liver tissue to hepatocellular carcinoma using project.org/package=knockoff, 2018.
high-throughput methylation data. 15. Sesia M, Sabatti C, Candès EJ. Rejoinder: ‘gene hunting
• The knockoff framework strictly controls FDR at a target with hidden Markov model knockoffs’. Biometrika 2019;
value while the RDVS framework fails to do that in most 106(1):35–45.
cases. 16. Greenland S. Alternative models for ordinal logistic regres-
sion. Stat Med 1994;13(16):1665–77.
High-dimensional variable selection for ordinal outcomes 345

17. Hothorn T, Buehlmann P, Kneib T, et al. mboost: Model-Based 36. Li CW, Chang PY, Chen BS. Investigating the mechanism
Boosting, R Package Version 2.9-1, https://CRAN.R-project.org/ of hepatocellular carcinoma progression by constructing
package=mboost, 2018. genetic and epigenetic networks using NGS data iden-
18. Schmid M, Hothorn T. Boosting additive models using tification and big database mining method. Oncotarget
component-wise P-splines. Comput Stat Data Anal 2008; 2016;7(48):79453–73.
53(2):298–311. 37. Yang B, Guo M, Herman JG, et al. Aberrant promoter methy-
19. Shen A, Fu H, He K, et al. False discovery rate control in can- lation profiles of tumor suppressor genes in hepatocellular
cer biomarker selection using knockoffs. Cancer 2019;11(6): carcinoma. Am J Pathol 2003;163(3):1101–7.
744. 38. Wang L, Sun L, Huang J, et al. Cyclin-dependent kinase
20. Breiman L. Random forests. Mach Learn 2001;45(1):5–32. inhibitor 3 (CDKN3) novel cell cycle computational network
21. Janitza S, Tutz G, Boulesteix AL. Random forest for ordinal between human non-malignancy associated hepatitis/cir-
responses: prediction and variable selection. Comput Stat rhosis and hepatocellular carcinoma (HCC) transformation.
Data Anal 2016;96:57–73. Cell Prolif 2011;44(3):291–9.
22. Hornung R. Ordinal forests. J Classif 2019;1–14. doi: 39. Budhu A, Forgues M, Ye QH, et al. Prediction of venous
https://doi.org/10.1007/s00357-018-9302-x metastases, recurrence, and prognosis in hepatocellular car-
23. Hornung R. ordinalForest: Ordinal Forests: Prediction and Variable cinoma based on a unique immune response signature of
Ranking with Ordinal Target Variables, R Package Version 2.3-1, the liver microenvironment. Cancer Cell 2006;10(2):99–111.
https://CRAN.R-project.org/package=ordinalForest, 2019. 40. Li S, Deng Y, Chen ZP, et al. Genetic polymorphism of
24. Zou H, Hastie T. Regularization and variable selection via interleukin-16 influences susceptibility to HBV-related hep-
the elastic net. J R Stat Soc Series B Stat Methodology 2005; atocellular carcinoma in a Chinese population. Infect Genet
67(2):301–20. Evol 2011;11(8):2083–8.
25. Friedman J, Hastie T, Tibshirani R. Regularization paths for 41. Laquaglia MJ, Grijalva JL, Mueller KA, et al. YAP subcellu-
generalized linear models via coordinate descent. J Stat Softw lar localization and hippo pathway transcriptome analy-
2010;33(1):1. sis in pediatric hepatocellular carcinoma. Sci Rep 2016;6:
26. Jonckheere AR. A distribution-free k-sample test against 30238.
ordered alternatives. Biometrika 1954;41(1/2):133–45. 42. Jia Y, Yang Y, Liu S, et al. SOX17 antagonizes WNT/β-catenin
27. Seshan VE. clinfun: Clinical Trial Design and Data Analysis signaling pathway in hepatocellular carcinoma. Epigenetics
Functions, R Package Version 1.0.15, https://CRAN.R-project. 2010;5(8):743–9.
org/package=clinfun, 2018. 43. Chang X, Han J, Pang L, et al. Increased PADI4 expression in
28. Yee TW. Vector Generalized Linear and Additive Models: With an blood and tissues of patients with malignant tumors. BMC
Implementation in R. New York, NY: Springer, 2015. Cancer 2009;9:40.
29. Ohio Supercomputer Center. Columbus, OH: 1987, Available 44. Wong CM, Lee JM, Ching YP, et al. Genetic and epigenetic
from: http://osc.edu/ark:/19495/f5s1ph73. alterations of DLC-1 gene in hepatocellular carcinoma. Can-
30. Storey JD, Tibshirani R. Statistical significance for cer Res 2003;63(22):7646–51.
genomewide studies. Proc Natl Acad Sci 2003;100(16):9440–5. 45. Jordon J, Yoon J, Schaar M. KnockoffGAN: generating knock-
31. Hu F, Han J, Zhai B, et al. Blocking autophagy enhances offs for feature selection using generative adversarial net-
the apoptosis effect of bufalin on human hepatocellular works. International Conference on Learning Representations,
carcinoma cells through endoplasmic reticulum stress and New Orleans, LA: OpenReview, 2019.
JNK activation. Apoptosis 2014;19(1):210–23. 46. Liu Y, Zheng C. Auto-encoding knockoff generator for
32. Nahon P, Sutton A, Rufat P, et al. Myeloperoxidase and super- FDR controlled variable selection. arXiv preprint arXiv.
oxide dismutase 2 polymorphisms comodulate the risk of 2018;1809:10765.
hepatocellular carcinoma and death in alcoholic cirrhosis. 47. Romano Y, Sesia M, Candès EJ. Deep knock-
Hepatology 2009;50(5):1484–93. offs. arXiv preprint arXiv. 2018;1811:06687. doi:
33. Jauhiainen A, Thomsen C, Strömbom L, et al. Distinct cyto- 10.1080/01621459.2019.1660174
plasmic and nuclear functions of the stress induced protein 48. Kingma DP, Welling M. Auto-encoding variational Bayes.
DDIT3/CHOP/GADD153. PLoS ONE 2012;7(4):e33208. International Conference on Learning Representations, 2014.
34. Marciniak SJ, Yun CY, Oyadomari S, et al. CHOP induces 1312.6114
death by promoting protein synthesis and oxidation in 49. Xie Y, Pan W, Khodursky AB. A note on using permutation-
the stressed endoplasmic reticulum. Genes Dev 2004;18(24): based false discovery rate estimates to compare differ-
3066–77. ent analysis methods for microarray data. Bioinformatics
35. He K, Zheng X, Li M, et al. mTOR inhibitors induce apoptosis 2005;21(23):4280–8.
in colon cancer cells via CHOP-dependent DR5 induction on 50. Yang YH, Lin WY, Lee WCA. Fuzzy permutation method for
4E-BP1 dephosphorylation. Oncogene 2016;35(2):148–57. false discovery rate control. Sci Rep 2016;6:28507.

You might also like