Parallel Accelerated Vector Similarity Calculations for Genomics Applications

Joubert, Wayne; Nance, James; Weighill, Deborah; Jacobson, Daniel

doi:10.1016/j.parco.2018.03.009

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1705.08210 (cs)

[Submitted on 23 May 2017 (v1), last revised 20 Apr 2018 (this version, v3)]

Title:Parallel Accelerated Vector Similarity Calculations for Genomics Applications

Authors:Wayne Joubert, James Nance, Deborah Weighill, Daniel Jacobson

View PDF

Abstract:The surge in availability of genomic data holds promise for enabling determination of genetic causes of observed individual traits, with applications to problems such as discovery of the genetic roots of phenotypes, be they molecular phenotypes such as gene expression or metabolite concentrations, or complex phenotypes such as diseases. However, the growing sizes of these datasets and the quadratic, cubic or higher scaling characteristics of the relevant algorithms pose a serious computational challenge necessitating use of leadership scale computing. In this paper we describe a new approach to performing vector similarity metrics calculations, suitable for parallel systems equipped with graphics processing units (GPUs) or Intel Xeon Phi processors. Our primary focus is the Proportional Similarity metric applied to Genome Wide Association Studies (GWAS) and Phenome Wide Association Studies (PheWAS). We describe the implementation of the algorithms on accelerated processors, methods used for eliminating redundant calculations due to symmetries, and techniques for efficient mapping of the calculations to many-node parallel systems. Results are presented demonstrating high per-node performance and parallel scalability with rates of more than five quadrillion elementwise comparisons achieved per second on the ORNL Titan system. In a companion paper we describe corresponding techniques applied to calculations of the Custom Correlation Coefficient for comparative genomics applications.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Performance (cs.PF)
MSC classes:	65Y05, 68W10
Cite as:	arXiv:1705.08210 [cs.DC]
	(or arXiv:1705.08210v3 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1705.08210
Related DOI:	https://doi.org/10.1016/j.parco.2018.03.009

Submission history

From: Wayne Joubert [view email]
[v1] Tue, 23 May 2017 12:34:55 UTC (5,356 KB)
[v2] Sun, 18 Mar 2018 23:17:55 UTC (5,427 KB)
[v3] Fri, 20 Apr 2018 15:47:04 UTC (5,427 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Parallel Accelerated Vector Similarity Calculations for Genomics Applications

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Parallel Accelerated Vector Similarity Calculations for Genomics Applications

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators