RabbitMash: accelerating hash-based genome analysis on modern multi-core architectures

Z Yin, X Xu, J Zhang, Y Wei, B Schmidt, W Liu - Bioinformatics, 2021 - academic.oup.com
Bioinformatics, 2021academic.oup.com
Motivation Mash is a popular hash-based genome analysis toolkit with applications to
important downstream analyses tasks such as clustering and assembly. However, Mash is
currently not able to fully exploit the capabilities of modern multi-core architectures, which in
turn leads to high runtimes for large-scale genomic datasets. Results We present
RabbitMash, an efficient highly optimized implementation of Mash which can take full
advantage of modern hardware including multi-threading, vectorization and fast I/O. We …
Motivation
Mash is a popular hash-based genome analysis toolkit with applications to important downstream analyses tasks such as clustering and assembly. However, Mash is currently not able to fully exploit the capabilities of modern multi-core architectures, which in turn leads to high runtimes for large-scale genomic datasets.
Results
We present RabbitMash, an efficient highly optimized implementation of Mash which can take full advantage of modern hardware including multi-threading, vectorization and fast I/O. We show that our approach achieves speedups of at least 1.3, 9.8, 8.5 and 4.4 compared to Mash for the operations sketch, dist, triangle and screen, respectively. Furthermore, RabbitMash is able to compute the all-versus-all distances of 100 321 genomes in <5 min on a 40-core workstation while Mash requires over 40 min.
Availability and implementation
RabbitMash is available at https://github.com/ZekunYin/RabbitMash.
Supplementary information
Supplementary data are available at Bioinformatics online.
Oxford University Press
Showing the best result for this search. See all results