ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

Fang, Xiaomin; Liu, Lihang; Lei, Jieqiong; He, Donglong; Zhang, Shanzhuo; Zhou, Jingbo; Wang, Fan; Wu, Hua; Wang, Haifeng

Computer Science > Machine Learning

arXiv:2106.06130v1 (cs)

[Submitted on 11 Jun 2021 (this version), latest version 23 Feb 2022 (v4)]

Title:ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

Authors:Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, Haifeng Wang

View PDF

Abstract:Effective molecular representation learning is of great importance to facilitate molecular property prediction, which is a fundamental task for the drug and material industry. Recent advances in graph neural networks (GNNs) have shown great promise in applying GNNs for molecular representation learning. Moreover, a few recent studies have also demonstrated successful applications of self-supervised learning methods to pre-train the GNNs to overcome the problem of insufficient labeled molecules. However, existing GNNs and pre-training strategies usually treat molecules as topological graph data without fully utilizing the molecular geometry information. Whereas, the three-dimensional (3D) spatial structure of a molecule, a.k.a molecular geometry, is one of the most critical factors for determining molecular physical, chemical, and biological properties. To this end, we propose a novel Geometry Enhanced Molecular representation learning method (GEM) for Chemical Representation Learning (ChemRL). At first, we design a geometry-based GNN architecture that simultaneously models atoms, bonds, and bond angles in a molecule. To be specific, we devised double graphs for a molecule: The first one encodes the atom-bond relations; The second one encodes bond-angle relations. Moreover, on top of the devised GNN architecture, we propose several novel geometry-level self-supervised learning strategies to learn spatial knowledge by utilizing the local and global molecular 3D structures. We compare ChemRL-GEM with various state-of-the-art (SOTA) baselines on different molecular benchmarks and exhibit that ChemRL-GEM can significantly outperform all baselines in both regression and classification tasks. For example, the experimental results show an overall improvement of $8.8\%$ on average compared to SOTA baselines on the regression tasks, demonstrating the superiority of the proposed method.

Subjects:	Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Molecular Networks (q-bio.MN)
Cite as:	arXiv:2106.06130 [cs.LG]
	(or arXiv:2106.06130v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2106.06130

Submission history

From: Xiaomin Fang [view email]
[v1] Fri, 11 Jun 2021 02:35:53 UTC (1,026 KB)
[v2] Thu, 8 Jul 2021 05:36:07 UTC (1,026 KB)
[v3] Fri, 30 Jul 2021 01:40:19 UTC (1,035 KB)
[v4] Wed, 23 Feb 2022 03:32:35 UTC (1,035 KB)

Computer Science > Machine Learning

Title:ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators