Prediction of Natural Product Classes Using Machine Learning and 13C NMR Spectroscopic Data
SH Martinez-Trevino, V Uc-Cetina… - Journal of Chemical …, 2020 - ACS Publications
Journal of Chemical Information and Modeling, 2020•ACS Publications
Structure elucidation of chemical compounds is a complex and challenging activity that
requires expertise and well-suited tools. To assign the molecular structure of a given
compound, 13C NMR is one of the most widely used techniques because of its broad range
of structural information. Taking into account that molecules found in nature can be grouped
into natural product (NP) classes because of structural similarities, we explore the possibility
of NP class prediction via 13C NMR data. Employing freely available 13C NMR data of NPs …
requires expertise and well-suited tools. To assign the molecular structure of a given
compound, 13C NMR is one of the most widely used techniques because of its broad range
of structural information. Taking into account that molecules found in nature can be grouped
into natural product (NP) classes because of structural similarities, we explore the possibility
of NP class prediction via 13C NMR data. Employing freely available 13C NMR data of NPs …
Structure elucidation of chemical compounds is a complex and challenging activity that requires expertise and well-suited tools. To assign the molecular structure of a given compound, 13C NMR is one of the most widely used techniques because of its broad range of structural information. Taking into account that molecules found in nature can be grouped into natural product (NP) classes because of structural similarities, we explore the possibility of NP class prediction via 13C NMR data. Employing freely available 13C NMR data of NPs, we trained four classifiers for the prediction of eight common NP classes. The best performance was obtained with the XGBoost classifier reaching f1-scores of above 0.82. We also performed experiments with different percentages of positive samples, including the glycoside presence. Furthermore, we tested cases outside the data set, yielding performances above 80% for most classes. For the chromans case, we restricted the test examples to the coumarin subclass, and the prediction accuracy increased to 100%.
ACS Publications
Showing the best result for this search. See all results