- Shorey-Kendrick, Lyndsey E;
- Davis, Brett;
- Gao, Lina;
- Park, Byung;
- Vu, Annette;
- Morris, Cynthia D;
- Breton, Carrie V;
- Fry, Rebecca;
- Garcia, Erika;
- Schmidt, Rebecca J;
- O’Shea, T Michael;
- Tepper, Robert S;
- McEvoy, Cindy T;
- Spindel, Eliot R;
- Outcomes, on behalf of program collaborators for Environmental influences on Child Health
Background
Maternal cigarette smoking during pregnancy (MSDP) is associated with numerous adverse health outcomes in infants and children with potential lifelong consequences. Negative effects of MSDP on placental DNA methylation (DNAm), placental structure, and function are well established.Objective
Our aim was to develop biomarkers of MSDP using DNAm measured in placentas (N=96), collected as part of the Vitamin C to Decrease the Effects of Smoking in Pregnancy on Infant Lung Function double-blind, placebo-controlled randomized clinical trial conducted between 2012 and 2016. We also aimed to develop a digital polymerase chain reaction (PCR) assay for the top ranking cytosine-guanine dinucleotide (CpG) so that large numbers of samples can be screened for exposure at low cost.Methods
We compared the ability of four machine learning methods [logistic least absolute shrinkage and selection operator (LASSO) regression, logistic elastic net regression, random forest, and gradient boosting machine] to classify MSDP based on placental DNAm signatures. We developed separate models using the complete EPIC array dataset and on the subset of probes also found on the 450K array so that models exist for both platforms. For comparison, we developed a model using CpGs previously associated with MSDP in placenta. For each final model, we used model coefficients and normalized beta values to calculate placental smoking index (PSI) scores for each sample. Final models were validated in two external datasets: the Extremely Low Gestational Age Newborn observational study, N=426; and the Rhode Island Children's Health Study, N=237.Results
Logistic LASSO regression demonstrated the highest performance in cross-validation testing with the lowest number of input CpGs. Accuracy was greatest in external datasets when using models developed for the same platform. PSI scores in smokers only (n=72) were moderately correlated with maternal plasma cotinine levels. One CpG (cg27402634), with the largest coefficient in two models, was measured accurately by digital PCR compared with measurement by EPIC array (R2=0.98).Discussion
To our knowledge, we have developed the first placental DNAm-based biomarkers of MSDP with broad utility to studies of prenatal disease origins. https://doi.org/10.1289/EHP13838.