Task1 Multiomics Ghaemi2019multiomics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Bioinformatics, 35(1), 2019, 95–103

doi: 10.1093/bioinformatics/bty537
Advance Access Publication Date: 2 July 2018
Original Paper

Systems biology

Multiomics modeling of the immunome,


transcriptome, microbiome, proteome and
metabolome adaptations during human pregnancy
Mohammad Sajjad Ghaemi1,2,3,4, Daniel B. DiGiulio5,6,
Kévin Contrepois7, Benjamin Callahan5,8, Thuy T. M. Ngo9,10,
Brittany Lee-McMullen7, Benoit Lehallier11, Anna Robaczewska5,6,
David Mcilwain12, Yael Rosenberg-Hasson13, Ronald J. Wong14,
Cecele Quaintance14, Anthony Culos1, Natalie Stanley1,
Athena Tanada1, Amy Tsai1, Dyani Gaudilliere1, Edward Ganio1,
Xiaoyuan Han1, Kazuo Ando1, Leslie McNeil1, Martha Tingle1,
Paul Wise14, Ivana Maric14, Marina Sirota15,16, Tony Wyss-Coray11,
Virginia D. Winn17, Maurice L. Druzin17, Ronald Gibbs17,
Gary L. Darmstadt14, David B. Lewis14, Vahid Partovi Nia2,3,
Bruno Agard2,4, Robert Tibshirani18,19, Garry Nolan12,
Michael P. Snyder7, David A. Relman5,6,12, Stephen R. Quake9,
Gary M. Shaw14, David K. Stevenson14, Martin S. Angst1,†,
Brice Gaudilliere1,† and Nima Aghaeepour1,*,†
1
Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine,
Stanford, CA, 94305, USA, 2Département de Mathématiques et de Génie Industriel, École Polytechnique de
Montréal, QC, H3T 1J4, Canada, 3Groupe d’Études et de Recherche en Analyse des Décision (GERAD), Montréal,
QC, H3T 1J4, Canada, 4Centre Interuniversitaire de Recherche sur les Réseaux d’Entreprise, la Logistique et le
Transport (CIRRELT), Montréal, QC, H3T 1J4, Canada, 5Department of Medicine, Stanford University School of
Medicine, Stanford, CA, 94305-5101, USA, 6Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, 94304,
USA, 7Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305-5120, USA,
8
Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State
University, Raleigh, NC, 27607, USA, 9Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA,
10
Cancer Early Detection Advanced Research Center, Knight Cancer Institute and Department of Molecular and
Medical Genetics, Oregon Health Sciences University, Portland, OR, 97239-3098, USA, 11Department of Neurology
and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, 94304, USA, 12Department of
Microbiology and Immunology, Stanford University, Stanford, CA, 94305-5124, USA, 13Institute for Immunity,
Transplantation and Infection, Human Immune Monitoring Center Stanford, CA, 94305, USA, 14Division of
Neonatology, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, 94304, USA,
15
Institute for Computational Health Sciences, University of California San Francisco, San Francisco, CA, 94158,
USA, 16Department of Pediatrics, University of California San Francisco, San Francisco, CA, 94143, USA,
17
Department of Obstetrics and Gynecology, Stanford University School of Medicine, Stanford, CA, 94304, USA,
18
Departments of Biomedical Data Sciences and Statistics, Stanford University, Stanford, CA, 94305-5464, USA and
19
Department of Statistics, Stanford University School of Medicine, Stanford, CA, 94305-4020, USA
*To whom correspondence should be addressed.

C The Author(s) 2018. Published by Oxford University Press.


V 95
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted
reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
96 M.S.Ghaemi et al.

Associate Editor: Jonathan Wren



The authors wish it to be known that, in their opinion, the last three authors should be regarded as Joint Co-senior
Authors.
Received on January 24, 2018; revised on June 22, 2018; editorial decision on June 26, 2018; accepted on July 2, 2018

Abstract
Motivation: Multiple biological clocks govern a healthy pregnancy. These biological mechanisms
produce immunologic, metabolomic, proteomic, genomic and microbiomic adaptations during the
course of pregnancy. Modeling the chronology of these adaptations during full-term pregnancy
provides the frameworks for future studies examining deviations implicated in pregnancy-related
pathologies including preterm birth and preeclampsia.
Results: We performed a multiomics analysis of 51 samples from 17 pregnant women, delivering
at term. The datasets included measurements from the immunome, transcriptome, microbiome,
proteome and metabolome of samples obtained simultaneously from the same patients.
Multivariate predictive modeling using the Elastic Net (EN) algorithm was used to measure the abil-
ity of each dataset to predict gestational age. Using stacked generalization, these datasets were
combined into a single model. This model not only significantly increased predictive power by
combining all datasets, but also revealed novel interactions between different biological modal-
ities. Future work includes expansion of the cohort to preterm-enriched populations and in vivo
analysis of immune-modulating interventions based on the mechanisms identified.
Availability and implementation: Datasets and scripts for reproduction of results are available
through: https://nalab.stanford.edu/multiomics-pregnancy/.
Contact: [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online.

1 Introduction human pregnancy that predicts gestational age at the time of sam-
pling (Aghaeepour et al., 2017). Similar results were reported in a
Physiological changes during pregnancy are highly dynamic and in-
longitudinal analysis of cell-free, maternal RNA (Pan et al., 2017)
volve coordinated changes among multiple interconnected molecu-
and plasma proteins (Aghaeepour et al., 2018). The primary object-
lar and cellular systems from the fetus, the fetal-membrane and the
ive of using gestational age as the clinical outcome in these studies is
mother (Diemert and Arck, 2018; Menon et al., 2016). The simul-
to extract molecular features that best capture normal chronological
taneous interrogation of these systems can reveal otherwise unrecog-
nized crosstalk. Understanding such crosstalk can inform several changes over the course of term pregnancy. Such knowledge will
lines of investigation. From a biological perspective, it can point to elucidate molecular deviations that are associated with pregnancy-
important disease mechanisms such as immune programming by the related pathologies. The second line of this work points to important
microbiome, or specific interactions between proteins and cellular pathophysiological derangements. For example, dense longitudinal
elements (Aghaeepour et al., 2017; Dethlefsen et al., 2007). From a sampling of the vaginal microbiome revealed community compos-
diagnostic perspective, it can reveal biomarkers from several bio- ition profiles associated with preterm birth that were validated in an
logical domains that provide higher predictive power if combined. independent cohort (Callahan et al., 2017; DiGiulio et al., 2015).
Alternatively, it can point to alternative biomarkers in an accessible However, the important work of bringing these data modalities to-
biological compartment, which can replace biomarkers that are dif- gether has remained unexplored.
ficult to obtain or expensive to measure. From a bioinformatics point of view, current multiomics efforts
Recent technological advances in science provide novel opportu- belong to two categories generally known as multi-staged and meta-
nities to unravel the complex biology of pregnancy. A particularly dimensional (Ritchie et al., 2015; Rohart et al., 2017). In multi-
pressing issue is to identify the biological pathways and the converg- staged analyses, measurements of the same biological factors (e.g.
ing pathological processes that lead to preterm birth (Lackritz et al., genes) are integrated at various biological levels and using different
2013). Preterm birth is the major cause of neonatal death, and the technological platforms (e.g. DNA and RNA sequencing, epigenetic
second leading cause of mortality in children under the age of analysis and proteomics assays—notable examples include
5 years (Liu et al., 2012). An ongoing cohort study by the March of (Emilsson et al., 2008; Maynard et al., 2008; Schadt et al., 2005;
Dimes Prematurity Research Center at Stanford University exploits Shabalin, 2012; Shen et al., 2009)). However, recent biological stud-
recent technological advances to examine an array of biological, ies extend well beyond just measurements of the same gene/protein
demographic, clinical and environmental factors associated with and include various assays that cannot be mapped to a single gene.
normal and pathological pregnancies (Stevenson et al., 2013; Shaw These include single cell analysis (Aghaeepour et al., 2017), imaging
et al., 2018; Wise et al., 2017). From a biological perspective, this (Woodward et al., 2006), profile of metabolic profiling (Piening
effort has so far produced two major lines of evidence. One line et al., 2018), actigraphy using wearable sensors (Halilaj et al., 2018)
sheds light onto precisely tuned chronological changes that occur and clinical phenotypes (Ferrero et al., 2016). Meta-dimensional
during normal pregnancy. For example, a highly multiplexed cell- multiomics approaches are now emerging that aim to combine het-
based assay in whole blood revealed an ‘immunological clock’ of erogeneous datasets to identify key factors at various biological
Multiomics modeling of human pregnancy 97

levels, their interactions with each other, and with clinical outcomes. patient were analyzed simultaneously by all omics platforms to min-
Some studies achieve this by simply merging all available datasets imize systematic technical confounders (Supplementary Fig. S4).
into a single matrix for joint modeling (Fridley et al., 2012;
Holzinger et al., 2014; Mankoo et al., 2011). These approaches are 2.4 Multivariate modeling
often susceptible to biases introduced by the differential sizes, modu- For a matrix X of all features from a given dataset, and a vector of
larities, scalings and batch effects of the included datasets. Various estimated gestational ages at the time of each sampling, Y, the EN
kernel (e.g. Borgwardt et al., 2005) and graph (e.g. Kim et al., 2012) algorithm calculates coefficients b to minimize the error term
transformations as well as latent space projections (Singh et al., LðbÞ ¼ jjY  Xbjj2 . An L1 regularization (Tibshirani, 1996) was
2016) have been proposed to address these biases. In settings where used to increase model sparsity (which facilitates biological inter-
analysis is performed against an external factor, an alternative is to pretation and validation). However, this approach is not ideal for
use mixture-of-experts methods to combine the results of independ- the analysis of the highly interrelated biological datasets, because it
ent models produced using each dataset through various algorithms only selects representatives of communities of highly correlated fea-
ranging from voting (e.g. Aghaeepour and Hoos, 2013) to integra- tures. As a result, features correlated to these selected representatives
tion of posterior Bayesian probabilities (Akavia et al., 2010; Zhu are disregarded, despite the fact that they could be biologically rele-
et al., 2008, 2012). vant. This limitation is addressed by using an additional L2 regular-
The main objective of this study was to test multiple strategies ization penalty: Lða; k; bÞ ¼ jjY  Xbjj2 þ k½ð1  aÞjjbjj2 þ ajjbjj1 ,
P
for integrating transcriptomic, immunological, microbiomic, metab- where jjbjj2 ¼ b> b and jjbjj1 ¼ ni¼1 jbi j. The subset selecting factor
olomic and proteomic datasets into different statistical models pre- k controls the sparsity of the model and the smoothing factor a con-
dicting gestational age in term pregnancy and identify the most trols the smoothing of selection from correlated variables (Zou and
accurate strategy. A final objective was to interrogate the derived Hastie, 2005).
model for novel and testable biological hypothesis.
2.5 Stack generalization
In the computer science literature, stacked generalization refers to
2 Materials and methods
the practice of combining several weak predictors for increased pre-
2.1 Study design dictive power (Breiman, 1996; Sharkey, 1996; Wolpert, 1992). In
Pregnant women presenting to the obstetrics clinics of the Lucile life sciences, this often translates to analysis of a single dataset using
Packard Children’s Hospital at Stanford University for prenatal care multiple algorithms and then combining the results in a final multi-
were invited to participate in a cohort study to prospectively exam- variate modeling step (Ge and Wong, 2008; He et al., 2013;
ine environmental and biological factors associated with normal and Larranaga et al., 2006; Yang et al., 2010; Wang et al., 2006). Here
pathological pregnancies. Women were eligible if they were at least we expand this concept to multiomics analysis where a single multi-
18 years of age and in their first trimester of a singleton pregnancy. variate analysis algorithm (EN) is used on a cohort of patients, and
In 17 women, three samples were collected during pregnancy and a the variable factor is the biological assays used for developing the
fourth one after deliver. The time points were chosen such that a datasets. First, an EN model is constructed on each dataset from the
peripheral blood sample (CyTOF analysis), a plasma sample (prote- same subjects. Then, all estimations of gestational age at time of
omic, cell-free transcriptomics, metabolomics analyses), a serum sampling are used as features for a final EN model. This, essentially,
sample (luminex analyses) and a series of culture swabs (microbiome is a weighted average of the individual models where the weights are
analysis) were simultaneously collected from each woman during the coefficients of the EN model.
the first (7–14 weeks), second (15–20 weeks) and third (24–
32 weeks) trimester of pregnancy and 6-week postpartum. Repeated 2.6 Cross-validation
sampling during pregnancy allowed assessing important biological An underlying assumption of the EN algorithm is statistical inde-
adaptations occurring continuously from the early phases of fetal pendence between all observations. In this analysis, while the sub-
development (first trimester) to the late phases of gestation (third tri- jects are independent, the samples collected from various trimesters
mester). The sample collected 6-week postpartum allowed for the of the same subject are not. To account for this, we designed a
assessment of the biological variables after the delivery of the fetus, leave-one-subject-out cross-validation strategy. In this setting, a
a surrogate for the non-pregnant state which is not accessible in a model is trained on all available samples except for the three trimes-
prospective study of pregnant women. ters of a given subject. The model is then tested on all samples of the
subject that it was blinded to. This process is repeated for all sub-
2.2 Gestational age estimation jects until a blinded prediction has been produced for all samples.
Gestational age was determined by best obstetrical estimate as rec- Final results are reported using these blinded predictions. This
ommended by the American College of Obstetricians and ensures complete independence from any intra-subject correlations.
Gynecologists (Hershey, 2014). A two-layer cross-validation strategy was implemented for sim-
ultaneous free-parameter optimization and analysis of the generaliz-
2.3 Biological assays ability of the results (Fig. 2A). The inner layer selects the best values
Plasma and serum samples were assayed using the Luminex plat- of a and k (see Supplementary Fig. S1). The outer layer ensures that
form for cytokine levels. In addition, plasma samples were used for performance is reported on subjects that the models were blinded to
proteomics analysis, LC-MS metabolomics analysis, and cell-free during training.
transcriptomic analysis. Whole blood samples were analyzed using A similar strategy was used for the stacked generalization step.
mass cytometry for single-cell characterization of the immune sys- Cross-validation folds where synchronized between the individual
tem. Finally, vaginal swabs, stool, saliva and tooth/gum samples models from each dataset and the integrated model to leave out the
were used for microbiomic profiling. See Supplementary Material same set of data points at all levels of the analysis. Importantly, this
for more detailed description of the assays. All timepoints of a given guarantees that not only the stacked generalization model, but also
98 M.S.Ghaemi et al.

its input features (i.e. the final predictions from each dataset) were Cellfree
RNA
Plasma
Luminex
Serum
Luminex Microbiome
A
blinded to the same subject during cross-validation.

2.7 Empirical evaluation


The procedure described above was empirically compared against a B Immune Metabolomics Plasma
System Somalogic
number of standard multivariate algorithms. The same algorithms
were used for the individual datasets as well stacked generalization 4 C

Number of measurements in log10 scale


(Fig. 5). The algorithms included Random Forest (Breiman, 2001), 30

Number of PCs for 90% Variance


3
Gaussian Process (Williams and Barber, 1998), Support Vector 25

Regression (Chang and Lin, 2011; Hsu and Lin, 2002) and XGboost 2
20

(Chen and Guestrin, 2016). The algorithms were compared using 15

the default implementations provided in the following packages: 1


10

(Chen and He, 2015; Karatzoglou et al., 2004; Liaw and Wiener, 5

2002). All algorithms were evaluated using the same two-layer 0 0

Cellfree

Luminex

Luminex

Microbiome

Immune

Metabolomics

Somalogic

Cellfree

Luminex

Luminex

Microbiome

Immune

Metabolomics

Somalogic
System
Plasma

System
Plasma
Serum

Serum
RNA

Plasma

RNA

Plasma
leave-one-patient-out CV strategy. The cross-validated parameter
space for Gaussian process and Support Vector Regression included
all available kernels [as described in (Karatzoglou et al., (2004)] and
initial noise variance between 0.001 and 10 000. EN predominantly Fig. 1. (A) Overview of the study design. A total of 357 samples from 51 visits
by 17 women were collected during three trimesters of pregnancy, as well as
outperforms the other methods on most datasets, followed by sup-
an additional 17 samples 6 weeks after delivery. Seven datasets were pro-
port vector regression. XGboost outperforms the other algorithms duced for each visit by each subject. (B) Data from each time point of each
on the microbiome dataset. subject were analyzed using seven high-throughput assays, which produced
different number of measurements. (C) The seven datasets had a range of
2.8 Model reduction correlations among the measured features. The internal correlation between
features from each dataset was quantified using the number of Principle
A bootstrapping procedure was used to reduce the number of fea-
Components (PCs) needed to capture 90% variance (datasets in which most
tures used in each model. As described in Aghaeepour et al. (2017), features are highly correlated would need fewer principal components)
one hundred bootstrap iterations were performed on each dataset
where 57 samples were drawn randomly and with replacement.
antibody-based cytokine measurements in plasma and serum, micro-
Piece-wise regression between the number of features (calculated by
biomic analyses (of vaginal swabs, stool, saliva and tooth/gum),
applying a range of thresholds to the mean coefficient of each meas-
mass cytometric analyses of whole blood, untargeted metabolomics
urement across all bootstrap iterations) and the final results of the
and targeted proteomics analysis of plasma. These datasets pro-
models were used to select the number of features for each modality
duced different levels of modularity (as measured by the number of
(Oosterbaan, 1994).
principal components needed to account for 90% variance of each
dataset—Fig. 1C). The modularity of the datasets (Fig. 1C) was not
2.9 Correlation network correlated with the number of measurements available (Fig. 1B).
The features from the reduced models were visualized using a graph
structure. Each feature was represented by a node. The correlation
3.2 Per-dataset analysis
structure between the features was extracted using a Minimum
An Elastic Net (EN) model was developed to predict the gestational
Spanning Tree (MST) where the width of the edges were proportion-
age of pregnancy of each subject at each visit. A two layer Cross-
al to the spearman P-value of the correlation between the two nodes,
Validation (CV) procedure was used to both optimize the free
on a log 1 0 scale. The graph was visualized using the Fruchterman-
parameters of the EN model (see Supplementary Fig. S1) and to en-
Reingolds layout (Fruchterman and Reingold, 1991).
sure that predictions were made on samples that were not used for
training model coefficients (see Fig. 2A and Section 2).
2.10 P-value adjustment Supplementary Figure S2 visualizes the predictions on the test sam-
All P-values were adjusted using Bonferroni’s method ples for each modality versus the clinical estimations of gestational
(adjusted-P-value ¼ minf1; raw-P-value  ng), where n is the num- age. P-values of correlation with gestational age at time of sampling
ber of features (Dunn, 1961). for the training and testing procedures are presented in Figure 2B
and C, respectively. Plasma proteomics analysis using the
2.11 Missing value interpolation SomaLogic platform produced the strongest predictive power
Missing values for all datasets were interpolated using a non- (Fig. 2B and Supplementary Fig. S6). Results remained generally
parametric multivariate model based on random forests. A model consistent between training and test sets (Fig. 2C). The datasets with
was trained for each feature of each dataset, and was subsequently a higher degree of independence between features (Fig. 1C) had a
used to estimate the missing values as described in Stekhoven and higher predictive power regardless of their size.
Bühlmann (2012). Due to the absence of true pre-pregnancy samples, we applied
these models to postpartum samples collected 6 weeks postpartum
as a surrogate for a non-pregnant state. At that time, some models
3 Results (e.g. the immunologic and metabolomic models) recovered towards
3.1 Modularity and size a state similar to a non-pregnant state, while others more closely
Samples from 17 women for a total of 51 timepoints throughout reflected an early pregnant state or remain stable after delivery. This
pregnancy and 6 weeks postparturm were collected. Samples were finding indicates that not all biological factors involved in pregnancy
analyzed for seven biological modalities: cell-free transcriptomics, recover at a similar rates (Fig. 2D).
Multiomics modeling of human pregnancy 99

to the final predictions (Fawcett and Hoos, 2016). This procedure


was performed by iteratively removing the most important dataset
from the mix (Fig. 4A). Importantly, for each iteration, the algo-
rithm was able to recalculate new weights for the remaining datasets
to partially compensate for any lost information. For example, after
removal of the proteomic and metabolomic datasets, the algorithm
significantly increased the weight of the predictions based on the im-
mune system to compensate for the two removed datasets. Similar
analysis in reverse order (Fig. 4B) revealed a minimal decrease in the
predictive power when the most important dataset was preserved.
To enable biological exploration, the top hits from each model
were extracted using a bootstrapping strategy for sensitivity analysis
(see Section 2 for details) and visualized using a minimum spanning
tree of Spearman correlations between the selected features on a
Fruchterman-Reingold layout (Fruchterman and Reingold, 1991), in
Figure 3B and C, respectively. This resulted in a set of 226 interre-
lated features (Supplementary Table S1), revealing statistically ro-
bust interactions within and between each omics dataset. A
Minimum Spanning Tree (MST) representation organized these
interactions into a branched structure in which the distance between
two features is proportional to the strength of the correlation be-
tween them. Metabolomics, transcriptomics and proteomics features
primarily segregated into three clusters (Fig. 3C). Cell-based features
from the immune system were distributed across the MST graph,
forming a link between other omics datasets rather than being con-
fined to a single cluster. The MST graph highlighted the connectivity
between biological processes measured in the plasma (metabolomic,
transcriptomic and proteomic measurements) or local compartments
(microbiomic data) and cell-specific immune responses measured in
the peripheral blood compartment.

3.4 Biological hypothesis generation


Fig. 2. (A) Overview of the two-layer CV procedure. On the outer layer, a Several biologically plausible and hypothesis generating correlations
modified leave-one-out procedure is used in which all samples from the between omics datasets emerged. Here, we highlight three of these
same subject (as opposed to just one sample) are left out as a blinded data- data-driven hypotheses. In one instance, we illustrate how the inte-
set. Within each fold, a second CV procedure is performed to optimize the grative dataset can inform additional experiments that allow further
free parameters of the EN model. Test samples for the inner and outer layers
exploration of the nature of observed interaction between different
are visualized in red and green, respectively. The final training prediction is
the median of predictions from all models that included that patient during
omics features.
their training (bottom), and the final blinded test set prediction comes from With respect to the microbiomic data, a strong correlation was
the only model that was blinded to it (top). See Section 2 for details. (B) and observed between changes in the composition of Neisseria bacterial
(C) The Spearman correlation P-values of the (B) training set and (C) test set species localized in the oral cavity as well as Bacteroides species in
results of the CV procedure for each dataset. (D) The models for each dataset the gut and TCRcdþ T cells. This finding is consistent with the
applied to all samples including the postpartum visit 6 weeks after delivery.
unique role of TCRcdþ T cells in mucosal immunity, particularly in
The average trend for each platform is visualized using kernel density estima-
tion for smoothing. The delivery range is highlighted in gray. Some models the control of oral pathogens (Chien et al., 2014; Moutsopoulos and
quickly recover towards a non-pregnant status (below the first trimester) Konkel, 2018; Wu et al., 2014). Given increasing epidemiological
while others remain stable after delivery evidence linking oral cavity dysbiosis and pregnancy-related compli-
cations, such as preterm labor and preeclampsia (Bassani et al.,
3.3 Stacked generalization 2007; Boggess et al., 2003; Bosnjak et al., 2006; Hajishengallis,
A stacked generalization strategy was used to combine the predictive 2015; Herrera et al., 2007; Nabet et al., 2010), our results raise the
powers of the different omics datasets as described in Wolpert hypothesis that the correlation between the changes in oral bacterial
(1992). As illustrated in Figure 3A, an EN model was first trained species and TCRcdþ T cell frequencies may be disrupted in patho-
on each dataset. Then, the estimations of gestational age produced logical pregnancies, such as preterm pregnancies.
by the seven independent models were merged using an additional With respect to the metabolomics dataset, the model revealed
EN model. Cross-validation was synchronized across all layers to strong correlations between the plasma factor pregnanolone
ensure predictions were made on samples that had not been used for sulfate and the NF-jB signaling in myeloid dendritic cells (mDCs)
optimizing model coefficients. The free parameters of the models, as and regulatory T cells (Tregs). Pregnanolone sulfate, or 3a; 5b-
calculated using the inner CV procedure (see Section 2), are visual- tetrahydroprogesterone(3a; 5b-THP), is an endogenous steroid bio-
ized in Supplementary Figure S1. synthesized from progesterone. Modulation of immune cell function
Ablation analysis, a procedure for investigating the path of data- by progesterone and its derivative is well established (Druckmann
set weights by iteratively retraining the stacked generalization and Druckmann, 2005). However, their roles in regulating the func-
model, was used to measure the relative contribution of each dataset tion of specific immune cell subsets during pregnancy are not fully
100 M.S.Ghaemi et al.

Fig. 3. (A) Stacked generalization analysis. The size of the boxes is proportional to the log 1 0 of the number of measurements in each dataset. The thickness of
the arrow is proportional to the  log 1 0 of P-value of a correlation test for gestational age; (B) The number of model components (x-axis) versus the P-value of
the Spearman correlation between each model and gestational age (y-axis). Lines represent the piece-wise regression fit for calculation of the number of features.
(C) Visualization of the most predictive features in a correlation network. The size of each node is proportional to the univariate correlation between that feature
and gestational age. Color represents the corresponding dataset

A B interaction contained the Chorionic Somatomammotropin


Hormone-1 (CSH-1), represented at the transcript (cell-free RNA
dataset) and protein (Somalogic dataset) levels, and the endogenous
activity of the transcription factor STAT5 measured at the single-
cell level in CD4þ and CD8þ T cell subsets. CSH-1 is known to
bind to the prolactin receptor (Walsh and Kossiakoff, 2006), which
signals through the JAK2/STAT5 signaling pathway (Gouilleux
et al., 1994). As such, results from the integrative analysis informs a
novel hypothesis that CSH-1 may directly activate the JAK2/STAT5
signaling pathway in CD4þ and CD8þ T cell subsets during
Fig. 4. Ablation analysis to measure the collective predictive power of the pregnancy.
model after removal of each dataset. At each iteration, the most (A) or least The strong correlation observed between CSH-1 RNA and pro-
(B) important datasets were removed from stacked generalization. Color is tein levels, and STAT5 activity in T cells (R ¼ 0.59, P ¼
proportional to the coefficients of the stacked generalization model. At each
4:40  1006 ) prompted further examination of this hypothesis in
iteration, the algorithm was able to readjust the coefficients. This demon-
strated that the algorithm could effectively use the remaining datasets to
an in vitro model to determine whether CSH-1 can directly activate
compensate for the latest removals the JAK2/STAT5 signaling pathway in T cells. However, incubation
of whole blood samples from non-pregnant or pregnant
(Supplementary Fig. S3) women with CSH-1 did not induce the
20

Random Forest
XGboost
Gaussian Process
Support Vector Regression
phosphorylation of STAT5 in CD4þ or CD8þ T cell subsets. On
Elastic net
further inspection of the proteomic dataset, CSH-1 was found to be-
15

long to a community of tightly correlated plasma factors known to


-Log(pvalue)

regulate the JAK/STAT signaling pathway. This community


10

included the inflammatory cytokine Interleukin-2. Supplementary


Figure S3 shows that, in contrast to CSH-1 or prolactin, incubation
5

of whole blood samples with IL-2 induced a robust STAT5 phos-


phorylation signal in all major T cell subsets. These results suggested
that in the context of pregnancy, the progressive increase in intracel-
0

Cellfree RNA Plasma Serum Microbiome Immune Metabolomics Plasma Stacked


Luminex Luminex System Somalogic Generalization
lular STAT5 activity in T cell subsets is likely driven by changes in
Fig. 5. Empirical evaluation of elastic-net, random forest, XGboost, Gaussian IL-2 rather than CSH-1.
Process and Support Vector Regression on each dataset, and the combin-
ation of all datasets. The hyper parameters of each method were tuned by the
same two-layer leave-one-patient-out CV procedure for the prediction of ges- 4 Discussion
tational age on the test set. EN predominantly outperformed the other meth-
ods on most datasets, followed by support vector regression. XGboost We have described an analysis of seven high-throughput biological
outperformed the other algorithms on the microbiome dataset modalities during term pregnancy. An agnostic machine learning ap-
proach was used to evaluate the predictive power of each dataset for
understood. The results thus generated a novel hypothesis that preg- estimation of gestational age using biological signals. An additional
nanolone sulfate may regulate important aspects of mDC and Treg machine learning layer was used to combine these estimations to fur-
functions during pregnancy. ther increase predictive power. Importantly, these datasets differed
With respect to the proteomic dataset, a three-way interaction in both size and modularity. By taking this two layer approach, we
between the transcriptomic, proteomic and cytomic datasets was prevented higher-dimensional datasets from overwhelming the final
particularly interesting, as it highlighted a novel connection between model. This both increased predictive power and facilitated bio-
previously reported models of molecular clocks of pregnancy. This logical interpretation.
Multiomics modeling of human pregnancy 101

Using this approach, we estimated the gestational age of the fetus address this, a two-step CV procedure was used in this analysis. The
at the time of each sampling. The stacked generalization algorithm inner layer enables optimization for the free-parameters of the EN
produced models more accurate than models derived from any indi- model using an exhaustive grid search (Supplementary Fig. S1). The
vidual dataset. Ablation analysis (Fawcett and Hoos, 2016) was outer layer ensures the generalizability of the results to previously
used to study the impact of each dataset on the final predictions. unseen samples. To increase sample size, each sample extracted at a
Importantly, this analysis showed that by retraining the stacked gen- trimester from a single subject was treated as an independent data
eralization model, other datasets could partially compensate for the point. To ensure the models were not biased by the dependency be-
removal of a given dataset. Using sensitivity analysis and piece-wise tween samples donated by the same subject, all three trimesters of a
regression and sequential feature-reduction, each model was reduced given subject were excluded together in the same CV fold.
to a limited number of required measurements. These were then Therefore, reported results are based on models that had access to
used for correlation analysis, visualization and biological interpret- no samples from a subject in the test-set. The samples used for test-
ation. These two complementary model reduction procedures lay ing purposes in all CV steps were synchronized across all models.
the foundation for objective analysis to strike a balance between Therefore, all test-set results (including those of the stacked general-
predictive-power and assay/sampling costs in resource-poor settings ization models) are reported only on samples that were blinded in
(e.g. a more expensive assay which requires a larger sample size all previous analyses.
from a complex biopsy may be replaceable by two cheaper and This study has several limitations that have inspired our future
more feasible assays). plans. First, the number of subjects in this ‘proof-of-concept’ cohort
The study provided an integrated biological model of maternal was small relative to the number of measurements. In addition, re-
changes during pregnancy, highlighting the interconnectivity of mul- cruitment from a single-care center limited the diversity of the data-
tiple biological systems. Notably, strong correlations between set. Despite this, we were able to capture the chronology of
metabolomic, proteomic, transcriptomic features and specific im- biological changes during pregnancy. This correlation was not
mune cell signaling responses pointed at biologically plausible inter- driven by age, BMI, or parity (partial correlation test P > 0.05).
actions. For example, the model identified a strong relationship However, given the racial disparities in pregnancy outcomes, repli-
between the steroid hormone pregnanolone sulfate and the signaling cating this analysis in more diverse cohorts is crucial. The March of
behavior of mDCs and Tregs. mDCs and Tregs play a critical role in Dimes Prematurity Center at Stanford University has already
feto-maternal tolerance and the maintenance of pregnancy engaged in several international collaborations to directly address
(Aluvihare et al., 2004; Erlebacher, 2013). Our data provide the this. Similarly, the number of measurements was significantly larger
basis for a novel hypothesis that pregnanolone sulfate plays a role in than the cohort size, which increased the possibility of false posi-
regulation of the function of these two cell types during pregnancy. tives. In addition to carefully designed cross-validation, feature re-
Alternatively, recent evidence indicating that T cells can produce duction and clustering (e.g. Bien and Tibshirani, 2011) can be used
pregnenolone, the precursor of pregnanolone sulfate (Mahata et al., to improve the predictive power of multivariate models in high-
2014), suggests that immune cells may be a cellular source of preg- dimensional settings and enable exploration of more interactions be-
nanolone sulfate production, providing another hypothesis for the tween different datasets. These various approaches should be tested
observed correlations. in an unbiased and collaborative setting (e.g. Aghaeepour et al.,
The study also shows that the biological interpretation of 2016; Stolovitzky et al., 2007) as large multiomics datasets become
observed interactions between two model components benefits from available. Finally, the current dataset included only one sample per
exploring the communities of features that strongly correlate with trimester, and these samples were treated as independent datapoints.
these model components. As such, the integrative model revealed a In the future, high-resolution sampling together with mixed effect
strong interaction between the protein factor CSH-1 and STAT5 ac- models (Gałecki and Burzykowski, 2013) will combine the informa-
tivity in CD4þ T cells. However, a community of protein factors tion content of different timepoints to produce increasingly more ac-
correlating with CSH-1 contained the cytokine IL-2, a canonical ac- curate prediction of pregnancy related events using serial sampling
tivator of the JAK/STAT5 signaling pathway in CD4þ T cells throughout pregnancy.
(Mahmud et al., 2013). Together with our in vitro data showing In summary, our study revealed a chronology of biologically-
that stimulation with IL-2, but not with CSH-1, results in STAT5 diverse events over the course of pregnancy. Our findings were
phosphorylation in CD4þ T cells, these findings suggest that the enabled using seven high-throughput longitudinal biological assays of
interaction between CSH-1 and STAT5 activity in CD4þ T cells is the same patient cohort. The computational pipeline introduced in
likely indirectly mediated by IL-2. For example, activation of the this article can increase predictive power by combining datasets of
PRL/CSH-1 receptor in cells other than T lymphocytes has been various sizes and modularities in a balanced way. We expect this pipe-
shown to promote the transcription of IL-2 (Sun et al., 2004). CSH- line to be applicable to a wide range of studies beyond the field of
1 may thus be implicated in the paracrine regulation of T cell func- pregnancy. Similarly, the dataset produced here provides a unique re-
tion through positive regulation of IL-2 gene expression in other im- source for future biological investigations. Particularly, this study can
mune or non-immune cell types. When applied to postpartum be used as a resource to identify correlates of any other features from
samples collected 6 weeks after delivery, these models demonstrated one of the seven datasets that may be identified in future studies.
that different biological modalities return to a non-pregnant state at Finally, by characterizing the biological chronology of normal preg-
different rates, reflecting synchronized pacemakers (Diemert and nancy, this study provides the conceptual and analytical framework to
Arck, 2018). This finding motivates detailed biological analysis of analyze the complex interplays between various biological modalities
the role of the inter-pregnancy interval (Girsen et al., 2018) and his- that govern preterm birth and other pregnancy-related pathologies.
tory of preterm birth in adverse outcomes (Gaudillière et al., 2015).
Selecting the hyperparameters of an EN model is largely a bal-
ancing act between sparsity and accuracy. In complex biological Acknowledgements
datasets, this is often confounded by the intrinsic characteristics of The authors would like to thank the members of the March of Dimes
data including size and modularity (Waldmann et al., 2013). To Prematurity Research Center at Stanford, as well as Joe Leigh Simpson, Jeff
102 M.S.Ghaemi et al.

Murray, Trevor Hastie, Ryan R. Brinkman and Holger H. Hoos for their DiGiulio,D.B. et al. (2015) Temporal and spatial variation of the human
feedback and inspiration. microbiota during pregnancy. Proc. Natl. Acad. Sci. USA, 112,
11060–11065.
Druckmann,R. and Druckmann,M.-A. (2005) Progesterone and the immun-
Funding ology of pregnancy. J. Steroid Biochem. Mol. Biol., 97, 389–396.
Dunn,O.J. (1961) Multiple comparisons among means. J. Am. Stat. Assoc.,
This study was supported by the March of Dimes Prematurity Research
56, 52–64.
Center at Stanford and the Bill and Melinda Gates Foundation
Emilsson,V. et al. (2008) Genetics of gene expression and its effect on disease.
(OPP1112382); additional funding was provided by the Department of
Nature, 452, 423–428.
Anesthesiology, Perioperative and Pain Medicine and Children Health
Erlebacher,A. (2013) Immunology of the maternal-fetal interface. Annu. Rev.
Research Institute at Stanford University. N.A. was supported by an Ann
Immunol., 31, 387–411.
Schreiber Mentored Investigator Award from the Ovarian Cancer Research
Fawcett,C. and Hoos,H.H. (2016) Analysing differences between algorithm
Fund (OCRF 292495), a Canadian Institute of Health Research (CIHR)
configurations through ablation. J. Heuristics, 22, 431–458.
Postdoctoral Fellowship (CIHR 321510), an International Society for
Ferrero,D.M. et al. (2016) Cross-country individual participant analysis of 4.1
Advancement of Cytometry Scholarship, and the Fonds de Recherche du
million singleton births in 5 countries with very high human development
Québec–Nature et Technologies (FRQNT) under international internship
index confirms known associations but provides no biologic explanation for
project grant 211363. M.P.S. and K.C. were supported by NIH grant
2/3 of all preterm births. PloS One, 11, e0162506.
5U54DK10255603. M.S. was supported by K01LM012381 and the Burrows
Fridley,B.L. et al. (2012) A bayesian integrative genomic model for pathway
Wellcome Fund. B.L. and T.W.C. were supported by the NOMIS
analysis of complex traits. Genet. Epidemiol., 36, 352–359.
Foundation. D.A.R. was supported by the Thomas C. and Joan M. Merigan Fruchterman,T.M. and Reingold,E.M. (1991) Graph drawing by
Endowment at Stanford University. force-directed placement. Softw. Pract. Exp., 21, 1129–1164.
Conflict of Interest: none declared. Gałecki,A. and Burzykowski,T. (2013) Linear Mixed-Effects Model. Springer,
New York, pp. 245–273.
Gaudillière,B. et al. (2015) Implementing mass cytometry at the bedside to
References study the immunological basis of human diseases: distinctive immune fea-
tures in patients with a history of term or preterm birth. Cytometry A, 87,
Aghaeepour,N. and Hoos,H.H. (2013) Ensemble-based prediction of RNA
817–829.
secondary structures. BMC Bioinformatics, 14, 139. Ge,G. and Wong,G.W. (2008) Classification of premalignant pancreatic can-
Aghaeepour,N. et al. (2016) A benchmark for evaluation of algorithms for cer mass-spectrometry data using decision tree ensembles. BMC
identification of cellular correlates of clinical outcomes. Cytometry Part A, Bioinformatics, 9, 275.
89, 16–21. Girsen,A.I. et al. (2018) What factors are related to recurrent preterm birth
Aghaeepour,N. et al. (2017) An immune clock of human pregnancy. Sci. among underweight women? J. Maternal-Fetal Neonatal Med., 31,
Immunol., 2, eaan2946. 560–566.
Aghaeepour,N. et al. (2018) A proteomic clock of human pregnancy. Am. J. Gouilleux,F. et al. (1994) Prolactin induces phosphorylation of tyr694 of stat5
Obstetr. Gynecol., 218, 347.e1–347.e14. (mgf), a prerequisite for dna binding and induction of transcription. EMBO
Akavia,U.D. et al. (2010) An integrated approach to uncover drivers of cancer. J., 13, 4361–4369.
Cell, 143, 1005–1017. Hajishengallis,G. (2015) Periodontitis: from microbial immune subversion to
Aluvihare,V.R. et al. (2004) Regulatory t cells mediate maternal tolerance to systemic inflammation. Nat. Rev. Immunol., 15, 30–44.
the fetus. Nat. Immunol., 5, 266–271. Halilaj,E. et al. (2018) Physical activity is associated with changes in knee car-
Bassani,D. et al. (2007) Periodontal disease and perinatal outcomes: a tilage microstructure. Osteoarthritis Cartilage 26, 770–774.
case-control study. J. Clin. Periodontol., 34, 31–39. He,L. et al. (2013) Extracting drug-drug interaction from the biomedical lit-
Bien,J. and Tibshirani,R. (2011) Hierarchical clustering with prototypes via erature using a stacked generalization-based approach. Plos One, 8,
minimax linkage. J. Am. Stat. Assoc., 106, 1075–1084. e65814–e65812.
Boggess,K.A. et al. (2003) Maternal periodontal disease is associated with an Herrera,J.A. et al. (2007) Periodontal disease severity is related to high levels
increased risk for preeclampsia. Obstetr. Gynecol., 101, 227–231. of c-reactive protein in pre-eclampsia. J. Hypertension, 25, 1459–1464.
Borgwardt,K.M. et al. (2005) Protein function prediction via graph kernels. Hershey,D.W. (2014) Fetal imaging: executive summary of a joint eunice ken-
Bioinformatics, 21, i47–i56. nedy shriver national institute of child health and human development, soci-
Bosnjak,A. et al. (2006) Pre-term delivery and periodontal disease: a case–con- ety for maternal-fetal medicine, american institute of ultrasound in
trol study from croatia. J. Clin. Periodontol., 33, 710–716. medicine, american college of obstetricians and gynecologists, american col-
Breiman,L. (1996) Stacked regressions. Mach. Learn., 24, 49–64. lege of radiology, society for pediatric radiology, and society of radiologists
Breiman,L. (2001) Random Forests. Mach. Learn., 45, 5–32. in ultrasound fetal imaging workshop. J. Ultrasound Med., 124, 836.
Callahan,B.J. et al. (2017) Replication and refinement of a vaginal microbial Holzinger,E.R. et al. (2014) Athena: the analysis tool for heritable and envir-
signature of preterm birth in two racially distinct cohorts of us women. onmental network associations. Bioinformatics, 30, 698–705.
Proc. Natl. Acad. Sci. USA, 114, 9966–9971. Hsu,C.-W. and Lin,C.-J. (2002) A comparison of methods for multiclass sup-
Chang,C.-C. and Lin,C.-J. (2011) Libsvm: a library for support vector port vector machines. IEEE Trans. Neural Netw., 13, 415–425. Mar
machines. ACM Trans. Intell. Syst. Technol, 2, 1–27: 27. 27: 1–May ISSN Karatzoglou,A. et al. (2004) S4 package for kernel methods in R. J. Stat.
2157-6904. Softw., 11, 1–20.
Chen,T. and Guestrin,C. (2016) Xgboost: a scalable tree boosting system. In: Kim,D. et al. (2012) Synergistic effect of different levels of genomic data for
Proceedings of the 22nd ACM SIGKDD International Conference on cancer clinical outcome prediction. J. Biomed. Inf., 45, 1191–1198.
Knowledge Discovery and Data Mining. ACM, pp. 785–794. Lackritz,E.M. et al. (2013) A solution pathway for preterm birth: accelerating
Chen,T. and He,T. (2015) Xgboost: extreme gradient boosting. R Package a priority research agenda. Lancet Global Health, 1, e328–e330.
Version 0.4-2. Larranaga,P. et al. (2006) Machine learning in bioinformatics. Brief. Bioinf.,
Chien,Y-h. et al. (2014) cd t cells: first line of defense and beyond. Annu. Rev. 7, 86–112.
Immunol., 32, 121–155. Liaw,A. and Wiener,M. (2002) Classification and regression by randomforest.
Dethlefsen,L. et al. (2007) An ecological and evolutionary perspective on R. News, 2, 18–22.
human–microbe mutualism and disease. Nature, 449, 811–818. Liu,L. et al. (2012) Global, regional, and national causes of child mortality: an
Diemert,A. and Arck,P.C. (2018) Pregnancy around the clock. Trends Mol. updated systematic analysis for 2010 with time trends since 2000. Lancet,
Med., 24, 1–3. 379, 98322151–98322161.
Multiomics modeling of human pregnancy 103

Mahata,B. et al. (2014) Single-cell rna sequencing reveals t helper cells synthe- Singh,A. et al. (2016) Diablo-an integrative, multi-omics, multivariate method
sizing steroids de novo to contribute to immune homeostasis. Cell Rep., 7, for multi-group classification. bioRxiv, 10.1101/067611.
1130–1142. Stekhoven,D.J. and Bühlmann,P. (2012) Missforest–non-parametric missing
Mahmud,S.A. et al. (2013) Interleukin-2 and stat5 in regulatory t cell develop- value imputation for mixed-type data. Bioinformatics, 28, 112–118.
ment and function. JAK-STAT, 2, e23154–e23156. Stevenson,D., M. of Dimes Prematurity Research Center at Stanford
Mankoo,P.K. et al. (2011) Time to recurrence and survival in serous ovarian University School of Medicine. et al. (2013) Transdisciplinary translational
tumors predicted from integrated genomic profiles. Plos One, 6, e24709. science and the case of preterm birth. J. Perinatol., 33, 251–258.
Maynard,N.D. et al. (2008) Genome-wide mapping of allele-specific pro- Stolovitzky,G. et al. (2007) Dialogue on reverse-engineering assessment and
tein-dna interactions in human cells. Nat. Methods, 5, 307–309. methods. Ann. N.Y. Acad. Sci., 1115, 1–22.
Menon,R. et al. (2016) Novel concepts on pregnancy clocks and alarms: re- Sun,R. et al. (2004) Expression of prolactin receptor and response to prolactin
dundancy and synergy in human parturition. Hum. Reprod. Update, 22, stimulation of human nk cell lines. Cell Res., 14, 67–73.
535–560. Tibshirani,R. (1996) Regression shrinkage and selection via the lasso. J. R.
Moutsopoulos,N.M. and Konkel,J.E. (2018) Tissue-specific immunity at the
Stat. Soc. Ser. B (Methodological), 58, 267–288.
oral mucosal barrier. Trends Immunol., 39, 276–287. Waldmann,P. et al. (2013) Evaluation of the lasso and the elastic net in
Nabet,C. et al. (2010) Maternal periodontitis and the causes of preterm birth:
genome-wide association studies. Front. Genet., 4, 270–211.
the case–control epipap study. J. Clin. Periodontol., 37, 37–45.
Walsh,S.T. and Kossiakoff,A.A. (2006) Crystal structure and site 1 binding en-
Oosterbaan,R.J. (1994) Frequency and regression analysis. Drainage
ergetics of human placental lactogen. J. Mol. Biol., 358, 773–784.
Principles Appl., 16, 175–224.
Wang,S.-Q. et al. (2006) Using stacked generalization to predict membrane
Pan,W. et al. (2017) Simultaneously monitoring immune response and micro-
protein types based on pseudo-amino acid composition. J. Theor. Biol., 242,
bial infections during pregnancy through plasma cfrna sequencing. Clin.
941–946.
Chem., 63, 1695–1704.
Williams,C.K.I. and Barber,D. (1998) Bayesian classification with gaussian
Piening,B. et al. (2018) Integrative personal omics profiles during periods of
processes. IEEE Trans. Pattern Anal. Mach. Intell., 20, 1342–1351. Dec
weight gain and loss. Cell Syst., 6, 157–170.e8.
Wise,P.H. et al. (2017) Risky business: meeting the structural needs of trans-
Ritchie,M.D. et al. (2015) Methods of integrating data to uncover
disciplinary science. J. Pediatrics, 191, 255–258.
genotype-phenotype interactions. Nat. Rev. Genet., 16, 85–97.
Wolpert,D.H. (1992) Stacked generalization. Neural Netw., 5, 241–259.
Rohart,F. et al. (2017) mixOmics: an R package for ’omics feature selection
Woodward,L.J. et al. (2006) Neonatal mri to predict neurodevelopmental out-
and multiple data integration. PLOS Comput. Biol., 13, 1–19.
Schadt,E.E. et al. (2005) An integrative genomics approach to infer causal comes in preterm infants. N. Engl. J. Med., 355, 685–694.
associations between gene expression and disease. Nat. Genet., 37, Wu,R.-Q. et al. (2014) The mucosal immune system in the oral cavity-an or-
710–717. chestra of t cell diversity. Int. J. Oral Sci., 6, 125–132.
Shabalin,A.A. (2012) Matrix eqtl: ultra fast eqtl analysis via large matrix oper- Yang,P. et al. (2010) A review of ensemble methods in bioinformatics. Current
ations. Bioinformatics, 28, 1353–1358. Bioinf., 5, 296–308.
Sharkey,A.J.C. (1996) On combining artificial neural nets. Connect. Sci., 8, Zhu,J. et al. (2008) Integrating large-scale functional genomic data to dissect
299–314. the complexity of yeast regulatory networks. Nat. Genet., 40, 854–861.
Shaw,G.M. et al. (2018) Residential agricultural pesticide exposures and risks Zhu,J. et al. (2012) Stitching together multiple data dimensions reveals inter-
of preeclampsia. Environ. Res., 164, 546–555. acting metabolomic and transcriptomic networks that modulate cell regula-
Shen,R. et al. (2009) Integrative clustering of multiple genomic data types tion. PLOS Biol., 10, e1001301–e1001319.
using a joint latent variable model with application to breast and lung can- Zou,H. and Hastie,T. (2005) Regularization and variable selection via the
cer subtype analysis. Bioinformatics, 25, 2906–2912. elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.), 67, 301–320.

You might also like